Showing posts with label Performance. Show all posts
Showing posts with label Performance. Show all posts

Monday, August 8, 2016

Automated SQL Tuning Task Failed with ORA-00600: [kpdbSwitchPreRestore: Txn]

Database: Oracle 12.1.0.2 RAC + Proactive Bundle Patch 12.1.0.2.160419
Operation System:  AIX 7.1


Automated SQL Tuning Task (SYS_AUTO_SQL_TUNING_TASK) failed and alert log shows errors:
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
2016-08-07 22:00:21.020000 -05:00
 <<Message truncated>>
2016-08-07 22:10:53.145000 -05:00
Errors in file /u01/app/oracle/diag/rdbms/cdb01/CDB01_1/trace/CDB01_1_j004_31719682.trc  (incident=109991) (PDBNAME=PDB01):
ORA-00600: internal error code, arguments: [kpdbSwitchPreRestore: txn], [0x7000101ECFA4F00], [0], [2685403136], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/cdb01/CDB01_1/incident/incdir_109991/CDB01_1_j004_31719682_i109991.trc
2016-08-07 22:11:11.467000 -05:00
Dumping diagnostic data in directory=[cdmp_201608071221111], requested by (instance=1, osid=85983625 (J004)), summary=[incident=109991].


The issue was not seen in non-cdb databases. According to Oracle Support explanation, the task is executed from a pluggable database but also accessing the container database through dblink to tune a remote query which triggers a bug. The bug can be fixed by patch 21283337 if it is available for your platform.

Before the patch is available or the Automated SQL Tuning Task is not really needed, as a workaround, the task (client_name is 'sql tuning advisor' in view dba_autotask_client) can be disabled,
system@PDB01> select client_name,status from dba_autotask_client;

CLIENT_NAME                                                      STATUS
---------------------------------------------------------------- --------
auto space advisor                                               ENABLED
auto optimizer stats collection                                  ENABLED
sql tuning advisor                                               ENABLED

system@PDB01> select client_name,operation_name,status from dba_autotask_operation;

CLIENT_NAME                              OPERATION_NAME                           STATUS
---------------------------------------- ---------------------------------------- --------
auto optimizer stats collection          auto optimizer stats job                 ENABLED
auto space advisor                       auto space advisor job                   ENABLED
sql tuning advisor                       automatic sql tuning task                ENABLED

system@PDB01> begin
  2  dbms_auto_task_admin.disable(client_name=>'SQL TUNING ADVISOR',
  3                               operation=>null,
  4                               window_name=>null);
  5  end;
  6  /

PL/SQL procedure successfully completed.

system@PDB01> select client_name,status from dba_autotask_client;

CLIENT_NAME                                                      STATUS
---------------------------------------------------------------- --------
sql tuning advisor                                               DISABLED
auto space advisor                                               ENABLED
auto optimizer stats collection                                  ENABLED

system@PDB01> select client_name,operation_name,status from dba_autotask_operation;

CLIENT_NAME                              OPERATION_NAME                           STATUS
---------------------------------------- ---------------------------------------- --------
auto optimizer stats collection          auto optimizer stats job                 ENABLED
auto space advisor                       auto space advisor job                   ENABLED
sql tuning advisor                       automatic sql tuning task                DISABLED


Usage notes for dbms_auto_task_admin.disable,

 * If operation and window_name are both NULL, the client is disabled.
 * If operation is not NULL, window_name is ignored and the operation is disabled
 * If operation is NULL and window_name is not NULL, the client is disabled in the specified window.

Note: If disabling the task as workaround, we have to disable it in all containers (PDB & root),

SYSTEM@PDB01> select client_name,status from dba_autotask_client;

CLIENT_NAME                                                      STATUS  
---------------------------------------------------------------- --------
auto space advisor                                               ENABLED
auto optimizer stats collection                                  ENABLED
sql tuning advisor                                               DISABLED

SYSTEM@PDB01> alter session set container=cdb$root;

Session altered.

SYSTEM@CDB$ROOT> select client_name,status from dba_autotask_client;

CLIENT_NAME                                                      STATUS  
---------------------------------------------------------------- --------
auto space advisor                                               ENABLED
auto optimizer stats collection                                  ENABLED
sql tuning advisor                                               ENABLED

SYSTEM@CDB$ROOT> select con_id,client_name,status from cdb_autotask_client;

    CON_ID CLIENT_NAME                                                      STATUS 
---------- ---------------------------------------------------------------- --------

         1 auto optimizer stats collection                                  ENABLED
         1 auto space advisor                                               ENABLED
         1 sql tuning advisor                                               ENABLED
         3 auto optimizer stats collection                                  ENABLED
         3 auto space advisor                                               ENABLED
         3 sql tuning advisor                                               DISABLED

6 rows selected.

SYSTEM@CDB$ROOT> exec dbms_auto_task_admin.disable('SQL TUNING ADVISOR',null,null);

PL/SQL procedure successfully completed.

SYSTEM@CDB$ROOT> select con_id,client_name,status from cdb_autotask_client;

    CON_ID CLIENT_NAME                                                      STATUS  
---------- ---------------------------------------------------------------- --------

         1 auto optimizer stats collection                                  ENABLED
         1 auto space advisor                                               ENABLED
         1 sql tuning advisor                                               DISABLED
         3 auto optimizer stats collection                                  ENABLED
         3 auto space advisor                                               ENABLED
         3 sql tuning advisor                                               DISABLED

6 rows selected.

Friday, December 26, 2014

ASMB process crashed the instance after 'PMON failed to acquire latch'

Environment:
Oracle Database: Enterprise Edition 11.2.0.2.12
Operating System:  AIX 6.1

Symptom:

Database alert log:
Fri Dec 26 07:23:04 2014
PMON failed to acquire latch, see PMON dump
Fri Dec 26 07:24:04 2014
PMON failed to acquire latch, see PMON dump
Fri Dec 26 07:25:36 2014
PMON failed to acquire latch, see PMON dump
Fri Dec 26 07:26:36 2014
PMON failed to acquire latch, see PMON dump
Fri Dec 26 07:28:07 2014
PMON failed to acquire latch, see PMON dump
Fri Dec 26 07:28:52 2014
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/oradb/ORADB/trace/ORADB_asmb_5439902.trc:
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Process ID:
Session ID: 152 Serial number: 9
Errors in file /U01/app/oracle/diag/rdbms/oradb/ORADB/trace/ORADB_asmb_5439902.trc:
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Process ID:
Session ID: 152 Serial number: 9
ASMB (ospid: 5439902): terminating the instance due to error 15064
Fri Dec 26 07:28:52 2014

PMON trace:
*** 2014-12-26 07:23:04.963
Location from where latch is held: kgh.h LINE:6387 ID:kghalo:
Context saved from call: 0
state=busy [holder orapid=16] wlstate=free [value=0]
waiters [orapid (seconds since: put on list, posted, alive check)]:
64 (103, 1419596584, 0)
... ...
61 (4, 1419596584, 0)
waiter count=17
gotten 38993536 times wait, failed first 285766 sleeps 71131
gotten 0 times nowait, failed: 0
Short stack dump:
... ...
possible holder pid = 16 ospid=5439902
----------------------------------------
SO: 0x7000002fec6df78, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x7000002fec6df78, name=process, file=ksu.h LINE:12451 ID:, pg=0
(process) Oracle pid:16, ser:1, calls cur/top: 0x7000002ff83fd00/0x7000002ff83fd00
flags : (0x6) SYSTEM
flags2: (0x0), flags3: (0x0)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 35
last post received-location: ksr2.h LINE:603 ID:ksrpublish
last process to post me: 7000002fec731a8 2 0
last post sent: 1051 0 2
last post sent-location: ksl2.h LINE:2293 ID:kslpsr
last process posted by me: 7000002fec69db8 1 6
(latch info) wait_event=0 bits=84
holding (efd=4) 700000000107940 Child shared pool level=7 child#=1
Location from where latch is held: kgh.h LINE:6387 ID:kghalo:
Context saved from call: 0
state=busy [holder orapid=16] wlstate=free [value=0]
waiters [orapid (seconds since: put on list, posted, alive check)]:
64 (103, 1419596584, 0)
... ...
61 (4, 1419596584, 0)
waiter count=17
holding (efd=4) 700000000047900 ASM map headers level=2
Location from where latch is held: kffm2.h LINE:306 ID:kffmAllocate:
Context saved from call: 0
state=busy [holder orapid=16] wlstate=free [value=0]
Process Group: DEFAULT, pseudo proc: 0x7000002fee6c858
O/S info: user: oragrid, term: UNKNOWN, ospid: 5439902
OSD pid info: Unix process pid: 5439902, image: oracle@host01.dbaplus.ca (ASMB)

ASMB trace:
*** 2014-12-26 07:28:52.409
NOTE: ASMB terminating
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Process ID:
Session ID: 152 Serial number: 9
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Process ID:
Session ID: 152 Serial number: 9
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+644<-kjzdssdmp()+444<-kjzduptcctx()+272<-kjzdicrshnfy()+96<-ksuitm()+1284<-ksbrdp()+4508<-opirip()+1216<-opidrv()+1156<-sou2o()+192<-opimai_real()+308<-ssthrdmain()+340<-main()+216<-__start()+112
----- End of Abridged Call Stack Trace -----

*** 2014-12-26 07:28:52.412
ASMB (ospid: 5439902): terminating the instance due to error 15064
ksuitm: waiting up to [5] seconds before killing DIAG(8519832)

AWR report:
Cache Sizes                       Begin        End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 6,464M 6,464M Std Block Size: 16K
Shared Pool Size: 5,152M 5,152M Log Buffer: 7,172K

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 97.27 In-memory Sort %: 100.00
Library Hit %: 73.34 Soft Parse %: 63.01
Execute to Parse %: 45.26 Latch Hit %: 99.74
Parse CPU to Parse Elapsd %: 4.75 % Non-Parse CPU: 92.21


Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
latch: shared pool 836 866 1035 38.4 Concurrenc
db file sequential read 227,015 281 1 12.5 User I/O
DB CPU 280 12.4
latch free 6 161 26794 7.1 Other
library cache lock 3 89 29759 4.0 Concurrenc

Check duplicate SQL:
select PLAN_HASH_VALUE,count(*) from v$sql
where EXECUTIONS=1 group by PLAN_HASH_VALUE order by 2 desc;

251958376 34281
3946703927 33694
1997326701 2603
3410233426 1221
3460305098 304
... ...
3150591124 1

select substr(sql_text,1,50),count(1) from v$sql
where PLAN_HASH_VALUE=251958376 group by substr(sql_text,1,50);

Obviously,  the application is not designed with consideration of bind variables, the duplicate SQLs (using literals instead of bind variables) caused massive shared pool fragmentation.

Therefore, ASMB process can not find a suitable sized chunk of memory before it was considered unresponsive and thereafter killed.

Workaround:

Since application can not be changed (system too old to find developer), tried CURSOR_SHARING=FORCE and no crash happened after that.

 

Tuesday, January 14, 2014

Simple Test on RAC Cache Fusion

RAC: Oracle Database 11.2.0.3
OS: Solaris 10 Sparc 64bit

Create a test table 'TEST_CACKE' and find out its OBJECT_ID
system@RACDB.RACDB_1.rac01> create table TEST_CACHE as select * from dba_tables;
Table created.
system@RACDB.RACDB_1.rac01> create index TEST_CACHE_IDX on TEST_CACHE(owner,table_name);
Index created.
system@RACDB.RACDB_1.rac01> select object_id from dba_objects where object_name='TEST_CACHE';
OBJECT_ID
----------
1707649 <--- Object id of table TEST_CACHE

Here, I also created a two-column composite index on columns (owner, table_name) to avoid full table scan and get the block cached normally. I will use the row which columns owner='SYS' ans table_name='CON$' in table TEST_CACHE as specimen:
system@RACDB.RACDB_1.rac01> select dbms_rowid.rowid_to_absolute_fno(rowid,'SYSTEM','TEST_CACHE') file#,
2 dbms_rowid.rowid_block_number(rowid) block#,
3 owner,table_name,tablespace_name
4 from test_cache
5* where owner='SYS' and table_name='CON$';

FILE# BLOCK# OWNER TABLE_NAME TABLESPACE_NAME
---------- ---------- ------- --------------- ------------------------------
1 122897 SYS CON$ SYSTEM <--- file# and block# which contains specified row

We can find that the row specimen is stored in BLOCK 122897 of FILE 1.
Find out if the block is cached:
system@RACDB.RACDB_1.rac01> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 scur <--- block#122897 is cached on instance 1

STATUS='scur' means block#122897 containing specified row is cached on instance 1. Yes, it was loaded from disk to memory by 'create index TEST_CACHE_IDX' statement.
To empty buffer cache:
system@RACDB.RACDB_1.rac01> alter system flush buffer_cache;
System altered.
system@RACDB.RACDB_1.rac01> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 free <--- cache memory of block#122897 is freed

The cache memory of block#122897 of table TEST_CACHE(object#1707649) is freed (not cached any more)
Re-load  block#122897 into cache on instance 1:
system@RACDB.RACDB_1.rac01> set autotrace on statistics
system@RACDB.RACDB_1.rac01> select owner,table_name,tablespace_name from test_cache where owner='SYS' and table_name='CON$';
OWNER TABLE_NAME TABLESPACE_NAME
------- --------------- ------------------------------
SYS CON$ SYSTEM

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
3 physical reads <--- Incur physical reads
0 redo size
297 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

The query caused '3  physical reads' to load table & index blocks into memory from disk.
Check cached data:
system@RACDB.RACDB_1.rac01> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 scur
1 free

block#122897 is cached on instance 1 again.
Now re-run the query and check the physical read:
system@RACDB.RACDB_1.rac01> select owner,table_name,tablespace_name from test_cache where owner='SYS' and table_name='CON$';

OWNER TABLE_NAME TABLESPACE_NAME
------------------------- ------------------------------ ------------------------------
SYS CON$ SYSTEM

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
312 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

It is '0 physical reads' because of cache on instance 1. Ok, that is what is expected. What if I run the query from other instance of the RAC?
Check if the block is cached on instance 2:
system@RACDB.RACDB_2.rac02> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 scur
1 free

No, it is not cached on instance 2. Run the query on instance 2 and see what will happen:
system@RACDB.RACDB_2.rac02> select owner,table_name,tablespace_name from test_cache where owner='SYS' and table_name='CON$';

OWNER TABLE_NAME TABLESPACE_NAME
------- --------------- ------------------------------
SYS CON$ SYSTEM

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
295 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

There is no cache of the block on instance 2, but Oracle did not read the data from disk (0 physical reads). Instead, it read/copy the block from cache on instance 1.
Let's see the current cache status:
system@RACDB.RACDB_2.rac02> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;

INST_ID STATUS
---------- ----------
2 scur <--- block#122897 is cached on instance 2
1 scur <--- block#122897 is cached on instance 1
1 free

Both instances caches the block block#122897.