Thursday, January 25, 2018

AIX Disk Only Visible to One ASM Instance of Grid Infrastructure

Oracle Grid Infrastructure 12.2.0.1
AIX 7.1


New disks are added to both nodes of Grid Infrastructure, and the disk permission is configured properly,
[grid@host01]$ ls -l /dev/*hdisk[56]
brw-------      1 root      system         21, 26 Jan 22 10:58 /dev/hdisk5
brw-------      1 root      system         21, 25 Jan 22 10:58 /dev/hdisk6
crw-rw----    1 grid     asmadmin     21, 26 Jan 24 16:07 /dev/rhdisk5
crw-rw----    1 grid     asmadmin     21, 25 Jan 24 16:06 /dev/rhdisk6

[grid@host02]$ ls -l /dev/*hdisk2[56]
brw-------      1 root      system          21, 25 Jan 22 11:08 /dev/hdisk5
brw-------      1 root      system          21, 31 Jan 22 11:08 /dev/hdisk6
crw-rw----    1 grid     asmadmin     21, 25 Jan 22 11:08 /dev/rhdisk5
crw-rw----    1 grid     asmadmin     21, 31 Jan 24 14:49 /dev/rhdisk6


ASMCA returned following errors while creating diskgroup DATA01 on these two disks (rhdisk5, rhdisk6)
[DBT-30028] Generic failure interacting with CRS. Details PRCR-1079 : Failed to start resource ora.DATA01.dg
CRS-5017: The resource action "ora.DATA01.dg start" encountered the following error:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA01" cannot be mounted
ORA-15040: diskgroup is incomplete
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/host02/crs/trace/crsd_oraagent_grid.trc".

CRS-2674: Start of 'ora.DATA01.dg' on 'host02' failed

New diskgroup DATA01 is created successfully, but cannot be mounted on second node host02, confirm status of the diskgroup,
[grid@host02]$ srvctl status diskgroup -diskgroup DATA01
Disk Group DATA01 is running on host01

The diskgroup is only mounted on host01. Search ASM instance alert log on host02,
SQL> ALTER DISKGROUP DATA01 MOUNT  /* asm agent *//* {1:5322:26006} */
NOTE: cache registered group DATA01 2/0x9C2312C1
NOTE: cache began mount (not first) of group DATA01 2/0x9C2312C1
2018-01-22 11:55:24.459000 -05:00
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 2/0x9C2312C1 (DATA01)
NOTE: messaging CKPT to quiesce pins Unix process pid: 10617178, image: oracle@host02 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 2/0x9C2312C1 (DATA01)
NOTE: cache ending mount (fail) of group DATA01 number=2 incarn=0x9c2312c1
NOTE: cache deleting context for group DATA01 2/0x9c2312c1
GMON dismounting group 2 at 48 for pid 48, osid 10617178
ERROR: diskgroup DATA01 was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA01" cannot be mounted
ORA-15040: diskgroup is incomplete
ERROR: ALTER DISKGROUP DATA01 MOUNT  /* asm agent *//* {1:5322:26006} */

ASM complained that the required disks cannot be found on host02. Double check the disks at ASM instance level,
sys@+ASM1.host01> select group_number,disk_number,path,state from v$asm_disk;

GROUP_NUMBER DISK_NUMBER PATH        
------------ ----------- ---------------------------------------
           1           0 /dev/rhdisk1
           2           0 /dev/rhdisk2
           2           1 /dev/rhdisk3
           3           0 /dev/rhdisk4
           4           0 /dev/rhdisk5
           4           1 /dev/rhdisk6

6 rows selected.

sys@+ASM2.host02> select group_number,disk_number,path,state from v$asm_disk;

GROUP_NUMBER DISK_NUMBER PATH        
------------ ----------- ---------------------------------------
           1           0 /dev/rhdisk1
           2           0 /dev/rhdisk2
           2           1 /dev/rhdisk3
           3           0 /dev/rhdisk4

Disks are visible to instance +ASM1 on host01, but not to +ASM2 on host02. However, all disks are visible on OS level on all servers (host01 & host02), and grid (Grid Infrastructure home owner) also has permission on these disks. Find out the configuration 'reserve_policy' of disks which stops disks shared across GI cluster nodes,
[grid@host01]$ lsattr -El hdisk5 -a reserve_policy
reserve_policy PR_shared Reserve Policy True
[grid@host01]$ lsattr -El hdisk6 -a reserve_policy
reserve_policy PR_shared Reserve Policy True

[grid@host02]$ lsattr -El hdisk5
reserve_policy PR_shared Reserve Policy True
[grid@host02]$ lsattr -El hdisk6 -a reserve_policy
reserve_policy PR_shared Reserve Policy True

In order to make all ASM instances able to access one disk at same time, disk attribute 'reserve_policy' has to be set to 'no_reserve' as following,
[grid@host01]$ srvctl stop diskgroup -diskgroup DATA01
[grid@host01]$ srvctl remove diskgroup -diskgroup DATA01
[root@host01]# chdev -l hdisk5 -a reserve_policy=no_reserve -P
[root@host01]# chdev -l hdisk6 -a reserve_policy=no_reserve -P
[root@host02]# chdev -l hdisk5 -a reserve_policy=no_reserve -P
[root@host02]# chdev -l hdisk6 -a reserve_policy=no_reserve -P

Then recreate diskgroup and new diskgroup will be able to be mounted on both nodes.

No comments: