Monday, January 26, 2015

TNSPING of 11g RAC database got very slow response

Business users complain it took several minutes to log into database. It used to take several seconds.

Confirmed informations as following:

Server: Solaris 10 SPARC + Oracle Database Server 11.2.0.3 RAC
Client: windows 7 + Oracle Client 11.2.0.3

Testing connectivity from workstation:
C:\>tnsping RACPD

TNS Ping Utility for 32-bit Windows: Version 11.2.0.3.0 - Production on 26-JAN-2015 10:49:15

Copyright (c) 1997, 2011, Oracle. All rights reserved.

Used parameter files:
c:\app\oracle\product\11.2.0\client11203\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=host-scan.dbaplus.ca)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=RACPD)))
OK (120000 msec)

Feel like the SCAN listener(s) is/are alive and there is network issue. If it is network problem, the issue should be reproduced on all three IP addresses bound with SCAN name (host-scan.dbaplus.ca). Let's find IPs first:
C:\>nslookup host-scan.dbaplus.ca
Server: ns01.dbaplus.ca
Address: 10.10.2.253

Name: host-scan.dbaplus.ca
Addresses: 10.10.2.30
10.10.2.31
10.10.2.29

Checking the IP availability:
C:\>ping 10.10.2.30

Pinging 10.10.2.30 with 32 bytes of data:
Reply from 10.10.2.30: bytes=32 time=5ms TTL=252
Reply from 10.10.2.30: bytes=32 time<1ms TTL=252
Reply from 10.10.2.30: bytes=32 time<1ms TTL=252
Reply from 10.10.2.30: bytes=32 time<1ms TTL=252

Ping statistics for 10.10.2.30:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 5ms, Average = 1ms

C:\>ping 10.10.2.31

Pinging 10.10.2.31 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.10.2.31:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\>ping 10.10.2.29

Pinging 10.10.2.29 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.10.2.29:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Two of three SCAN virtual IPs are timed out, the other one (10.10.2.30) is working. It is not network problem. Log onto database servers:
$ . oraenv
ORACLE_SID = [grid] ? +ASM1
The Oracle base has been set to /u01/app/oracle
$
$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.FRA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.LISTENER.lsnr
ONLINE ONLINE host01
ONLINE OFFLINE host02
ora.asm
ONLINE ONLINE host01 Started
ONLINE ONLINE host02 Started
ora.gsd
OFFLINE OFFLINE host01
OFFLINE OFFLINE host02
ora.net1.network
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.ons
ONLINE ONLINE host01
ONLINE ONLINE host02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE host02
ora.LISTENER_SCAN2.lsnr
1 ONLINE OFFLINE
ora.LISTENER_SCAN3.lsnr
1 ONLINE OFFLINE
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.host01.vip
1 ONLINE ONLINE host01
ora.host02.vip
1 ONLINE UNKNOWN host01
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.scan1.vip
1 ONLINE ONLINE host02
ora.scan2.vip
1 ONLINE UNKNOWN host01
ora.scan3.vip
1 ONLINE UNKNOWN host01

SCAN VIP scan2 & scan3 are in UNKNOWN status, corresponding SCAN listeners LISTENER_SCAN2 & LISTENER_SCAN3 are OFFLINE.

When I connect to database from workstation, Oracle client will work as following:
1. Ask DNS server for IPs for the SCAN name host-scan.dbaplus.ca, and get three IPs
2. Choose one of them to connect. If I am lucky, the one should be SCAN1 and I will not find any issues. Actually, I was not. scan2 (or scan3) was chosen and the request waited for response from server until time out because the SCAN VIP is not ONLINE, then failed over to another SCAN VIP scan3 (or scan2) or scan1. The worst thing is failover to scan3, it will get timeout again and have to fail over finally to scan1.
3. No matter where it starts, SCAN VIP scan1 will be the final and only choice for the connection. How fast the connection can be established depends on how fast the ONLINE SCAN VIP scan1 is chosen.

To fix the issue, try to start them:
$ srvctl start scan_listener -i 2
$ srvctl start scan_listener -i 3
$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node host02
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node host01
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node host01

$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.FRA.dg
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.LISTENER.lsnr
ONLINE ONLINE host01
ONLINE OFFLINE host02
ora.asm
ONLINE ONLINE host01 Started
ONLINE ONLINE host02 Started
ora.gsd
OFFLINE OFFLINE host01
OFFLINE OFFLINE host02
ora.net1.network
ONLINE ONLINE host01
ONLINE ONLINE host02
ora.ons
ONLINE ONLINE host01
ONLINE ONLINE host02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE host02
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE host01
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE host01
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.host01.vip
1 ONLINE ONLINE host01
ora.host02.vip
1 ONLINE UNKNOWN host01
ora.racdb.db
1 ONLINE ONLINE host01 Open
2 ONLINE ONLINE host02 Open
ora.scan1.vip
1 ONLINE ONLINE host02
ora.scan2.vip
1 ONLINE ONLINE host01
ora.scan3.vip
1 ONLINE ONLINE host01

All SCAN VIP and SCAN listeners are ONLINE. Back to workstation and test the connection:
C:\>tnsping RACPD

TNS Ping Utility for 32-bit Windows: Version 11.2.0.3.0 - Production on 26-JAN-2015 10:49:15

Copyright (c) 1997, 2011, Oracle. All rights reserved.

Used parameter files:
c:\app\oracle\product\11.2.0\client11203\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=host-scan.dbaplus.ca)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=RACPD)))
OK (10 msec)

'10 msec' response time is not bad. The issue got fixed.

No comments: