Showing posts with label RAC. Show all posts
Showing posts with label RAC. Show all posts

Monday, January 26, 2015

TNSPING of 11g RAC database got very slow response

Business users complain it took several minutes to log into database. It used to take several seconds.

Confirmed informations as following:

Server: Solaris 10 SPARC + Oracle Database Server 11.2.0.3 RAC
Client: windows 7 + Oracle Client 11.2.0.3

Testing connectivity from workstation:
C:\>tnsping RACPD

TNS Ping Utility for 32-bit Windows: Version 11.2.0.3.0 - Production on 26-JAN-2015 10:49:15

Copyright (c) 1997, 2011, Oracle. All rights reserved.

Used parameter files:
c:\app\oracle\product\11.2.0\client11203\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=host-scan.dbaplus.ca)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=RACPD)))
OK (120000 msec)

Feel like the SCAN listener(s) is/are alive and there is network issue. If it is network problem, the issue should be reproduced on all three IP addresses bound with SCAN name (host-scan.dbaplus.ca). Let's find IPs first:
C:\>nslookup host-scan.dbaplus.ca
Server: ns01.dbaplus.ca
Address: 10.10.2.253

Name: host-scan.dbaplus.ca
Addresses: 10.10.2.30
10.10.2.31
10.10.2.29

Checking the IP availability:
C:\>ping 10.10.2.30

Pinging 10.10.2.30 with 32 bytes of data:
Reply from 10.10.2.30: bytes=32 time=5ms TTL=252
Reply from 10.10.2.30: bytes=32 time<1ms TTL=252
Reply from 10.10.2.30: bytes=32 time<1ms TTL=252
Reply from 10.10.2.30: bytes=32 time<1ms TTL=252

Ping statistics for 10.10.2.30:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 5ms, Average = 1ms

C:\>ping 10.10.2.31

Pinging 10.10.2.31 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.10.2.31:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\>ping 10.10.2.29

Pinging 10.10.2.29 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.10.2.29:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Two of three SCAN virtual IPs are timed out, the other one (10.10.2.30) is working. It is not network problem. Log onto database servers:
$ . oraenv
ORACLE_SID = [grid] ? +ASM1
The Oracle base has been set to /u01/app/oracle
$
$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.FRA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.LISTENER.lsnr
ONLINE ONLINE host01
ONLINE OFFLINE host02
ora.asm
ONLINE ONLINE host01 Started
ONLINE ONLINE host02 Started
ora.gsd
OFFLINE OFFLINE host01
OFFLINE OFFLINE host02
ora.net1.network
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.ons
ONLINE ONLINE host01
ONLINE ONLINE host02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE host02
ora.LISTENER_SCAN2.lsnr
1 ONLINE OFFLINE
ora.LISTENER_SCAN3.lsnr
1 ONLINE OFFLINE
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.host01.vip
1 ONLINE ONLINE host01
ora.host02.vip
1 ONLINE UNKNOWN host01
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.scan1.vip
1 ONLINE ONLINE host02
ora.scan2.vip
1 ONLINE UNKNOWN host01
ora.scan3.vip
1 ONLINE UNKNOWN host01

SCAN VIP scan2 & scan3 are in UNKNOWN status, corresponding SCAN listeners LISTENER_SCAN2 & LISTENER_SCAN3 are OFFLINE.

When I connect to database from workstation, Oracle client will work as following:
1. Ask DNS server for IPs for the SCAN name host-scan.dbaplus.ca, and get three IPs
2. Choose one of them to connect. If I am lucky, the one should be SCAN1 and I will not find any issues. Actually, I was not. scan2 (or scan3) was chosen and the request waited for response from server until time out because the SCAN VIP is not ONLINE, then failed over to another SCAN VIP scan3 (or scan2) or scan1. The worst thing is failover to scan3, it will get timeout again and have to fail over finally to scan1.
3. No matter where it starts, SCAN VIP scan1 will be the final and only choice for the connection. How fast the connection can be established depends on how fast the ONLINE SCAN VIP scan1 is chosen.

To fix the issue, try to start them:
$ srvctl start scan_listener -i 2
$ srvctl start scan_listener -i 3
$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node host02
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node host01
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node host01

$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.FRA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.LISTENER.lsnr
ONLINE ONLINE host01
ONLINE OFFLINE host02
ora.asm
ONLINE ONLINE host01 Started
ONLINE ONLINE host02 Started
ora.gsd
OFFLINE OFFLINE host01
OFFLINE OFFLINE host02
ora.net1.network
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.ons
ONLINE ONLINE host01
ONLINE ONLINE host02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE host02
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE host01
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE host01
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.host01.vip
1 ONLINE ONLINE host01
ora.host02.vip
1 ONLINE UNKNOWN host01
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.scan1.vip
1 ONLINE ONLINE host02
ora.scan2.vip
1 ONLINE ONLINE host01
ora.scan3.vip
1 ONLINE ONLINE host01

All SCAN VIP and SCAN listeners are ONLINE. Back to workstation and test the connection:
C:\>tnsping RACPD

TNS Ping Utility for 32-bit Windows: Version 11.2.0.3.0 - Production on 26-JAN-2015 10:49:15

Copyright (c) 1997, 2011, Oracle. All rights reserved.

Used parameter files:
c:\app\oracle\product\11.2.0\client11203\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=host-scan.dbaplus.ca)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=RACPD)))
OK (10 msec)

'10 msec' response time is not bad. The issue got fixed.

Tuesday, January 14, 2014

Simple Test on RAC Cache Fusion

RAC: Oracle Database 11.2.0.3
OS: Solaris 10 Sparc 64bit

Create a test table 'TEST_CACKE' and find out its OBJECT_ID
system@RACDB.RACDB_1.rac01> create table TEST_CACHE as select * from dba_tables;
Table created.
system@RACDB.RACDB_1.rac01> create index TEST_CACHE_IDX on TEST_CACHE(owner,table_name);
Index created.
system@RACDB.RACDB_1.rac01> select object_id from dba_objects where object_name='TEST_CACHE';
OBJECT_ID
----------
1707649 <--- Object id of table TEST_CACHE

Here, I also created a two-column composite index on columns (owner, table_name) to avoid full table scan and get the block cached normally. I will use the row which columns owner='SYS' ans table_name='CON$' in table TEST_CACHE as specimen:
system@RACDB.RACDB_1.rac01> select dbms_rowid.rowid_to_absolute_fno(rowid,'SYSTEM','TEST_CACHE') file#,
2 dbms_rowid.rowid_block_number(rowid) block#,
3 owner,table_name,tablespace_name
4 from test_cache
5* where owner='SYS' and table_name='CON$';

FILE# BLOCK# OWNER TABLE_NAME TABLESPACE_NAME
---------- ---------- ------- --------------- ------------------------------
1 122897 SYS CON$ SYSTEM <--- file# and block# which contains specified row

We can find that the row specimen is stored in BLOCK 122897 of FILE 1.
Find out if the block is cached:
system@RACDB.RACDB_1.rac01> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 scur <--- block#122897 is cached on instance 1

STATUS='scur' means block#122897 containing specified row is cached on instance 1. Yes, it was loaded from disk to memory by 'create index TEST_CACHE_IDX' statement.
To empty buffer cache:
system@RACDB.RACDB_1.rac01> alter system flush buffer_cache;
System altered.
system@RACDB.RACDB_1.rac01> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 free <--- cache memory of block#122897 is freed

The cache memory of block#122897 of table TEST_CACHE(object#1707649) is freed (not cached any more)
Re-load  block#122897 into cache on instance 1:
system@RACDB.RACDB_1.rac01> set autotrace on statistics
system@RACDB.RACDB_1.rac01> select owner,table_name,tablespace_name from test_cache where owner='SYS' and table_name='CON$';
OWNER TABLE_NAME TABLESPACE_NAME
------- --------------- ------------------------------
SYS CON$ SYSTEM

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
3 physical reads <--- Incur physical reads
0 redo size
297 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

The query caused '3  physical reads' to load table & index blocks into memory from disk.
Check cached data:
system@RACDB.RACDB_1.rac01> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 scur
1 free

block#122897 is cached on instance 1 again.
Now re-run the query and check the physical read:
system@RACDB.RACDB_1.rac01> select owner,table_name,tablespace_name from test_cache where owner='SYS' and table_name='CON$';

OWNER TABLE_NAME TABLESPACE_NAME
------------------------- ------------------------------ ------------------------------
SYS CON$ SYSTEM

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
312 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

It is '0 physical reads' because of cache on instance 1. Ok, that is what is expected. What if I run the query from other instance of the RAC?
Check if the block is cached on instance 2:
system@RACDB.RACDB_2.rac02> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;
INST_ID STATUS
---------- ----------
1 scur
1 free

No, it is not cached on instance 2. Run the query on instance 2 and see what will happen:
system@RACDB.RACDB_2.rac02> select owner,table_name,tablespace_name from test_cache where owner='SYS' and table_name='CON$';

OWNER TABLE_NAME TABLESPACE_NAME
------- --------------- ------------------------------
SYS CON$ SYSTEM

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
295 bytes sent via SQL*Net to client
247 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

There is no cache of the block on instance 2, but Oracle did not read the data from disk (0 physical reads). Instead, it read/copy the block from cache on instance 1.
Let's see the current cache status:
system@RACDB.RACDB_2.rac02> select inst_id,status from gv$bh where file#=1 and block#=122897 and objd=1707649;

INST_ID STATUS
---------- ----------
2 scur <--- block#122897 is cached on instance 2
1 scur <--- block#122897 is cached on instance 1
1 free

Both instances caches the block block#122897.