DBA Plus Workshop: Grid Infrastructure 12.2 restore OCR, Voting File and mgmtdb when disk group corrupted

For some reason, Oracle desupported the placement of OCR and voting files directly on a shared file system from Grid Infrastructure 12.2 until rescinding the desupport for Standalone Clusters from 19.3. Therefore, when GI 12.2 is installed, OCR, Voting file, and OCR backup location are, by default, configured in the ASM disk group, and mostly likely in the same disk group. When the disk group cannot be mounted because of any issues, the cluster will not be brought up anymore.

Technically, GI has to be re-configured as like a new installation, that could be a big job. However, it may not be that bad. Although 12.2 does not allow you have OCR backup location out of ASM disk group, DBA still can copy the OCR backup file from ASM disk group to file system with command 'asmcmd cp'. I am going to demonstrate how we can maximize the opportunity of bringing the cluster backup without rebuild/reinstall it.

1. Retrieve OCR backup from corrupted disk group

If you do not have OCR backup out of the corrupted disk group, or the backup is too old, try to find the latest OCR backup from the corrupted disk group as following,

* Find out the disk devices which consist the disk group from ASM alert log file

Log file name "<DIAGNOSTIC_DEST>/diag/asm/+asm/<ASM_INSTANCE_NAME>/trace/alert_<ASM_INSTANCE_NAME>.log"

Sample contents in the sample alert file "/u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log"

2021-05-15T20:14:16.982466-04:00

NOTE: Diskgroup used for Voting files is:

DATA

NOTE: Diskgroup used for OCR is:DATA

NOTE: cache registered group DATA 1/0x7536B4AB

NOTE: cache began mount (first) of group DATA 1/0x7536B4AB

NOTE: Assigning number (1,1) to disk (/dev/mapper/ASMDATA2)

NOTE: Assigning number (1,0) to disk (/dev/mapper/ASMDATA1)

It shows that disk group DATA is used for OCR and Voting files, and DATA includes following two disks

/dev/mapper/ASMDATA2

/dev/mapper/ASMDATA1

* Find out OCR backup up files by running following command as root no matter clusterware is running or not

<GI_HOME>/bin/ocrconfig -showbackup

Sample output

# /u01/app/12.2.0/grid/bin/ocrconfig -showbackup

PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy

rac01 2021/05/15 00:23:38 +DATA:/cluster01/OCRBACKUP/backup00.ocr.272.1072571013 1211615871

rac01

rac01 2021/05/14 20:23:30 +DATA:/cluster01/OCRBACKUP/backup01.ocr.269.1072556603 1211615871

rac01

rac01 2021/05/14 20:23:30 +DATA:/cluster01/OCRBACKUP/day.ocr.270.1072556611 1211615871

rac01

rac01 2021/05/14 20:23:30 +DATA:/cluster01/OCRBACKUP/week.ocr.271.1072556611 1211615871

The latest was created at 2021/05/15 00:23:38 in disk group "DATA" with file number "272".

* Extract OCR backup from corrupted disk group by running command as root

<GI_HOME>/bin/amdu -diskstring '<disk-device-name>' -extract <disk-group-name>.<file-number>

Here, <disk-device-name>, <disk-group-name> and <file-number> are identified in previous steps.

Sample output

# /u01/app/12.2.0/grid/bin/amdu -diskstring '/dev/mapper/ASMDATA2' -extract DATA.272

amdu_2021_05_16_11_02_47/

AMDU-00208: File directory block not found. Cannot extract file DATA.272.

ERROR: no read quorum in group: required 1, found 0 disks

# /u01/app/12.2.0/grid/bin/amdu -diskstring '/dev/mapper/ASMDATA1' -extract DATA.272

amdu_2021_05_16_11_04_01/

AMDU-00216: 0xbadfda7a written to 1560 blocks of file DATA_272.f

# ls -l amdu_2021_05_16_11_04_01

total 12392

-rw-r--r-- 1 root root 12681216 May 16 11:04 DATA_272.f

-rw-r--r-- 1 root root 4365 May 16 11:04 report.txt

The "amdu" found the OCR backup from disk "/dev/mapper/ASMDATA1" (first disk of disk group DATA) and created two files,

DATA_272.f copy of OCR backup

report.txt Summary of the extract operation

File 'DATA_272.f' will be used for OCR restore.

2. Create new disk group

Before creating new disk group, you have to follow Oracle installation document to get new disks ready. Then re-create disk group as following,

* Stop Oracle Clusterware by running the following command as root on all nodes of the cluster

<GI_HOME>/bin/crsctl stop crs

If the command returns any error due to OCR corruption, running following following command as root on all nodes instead,

<GI_HOME>/bin/crsctl stop crs -f

* Start the Oracle Clusterware stack on one node in exclusive mode by running the following command as root

<GI_HOME>/bin/crsctl start crs -excl -nocrs

The -nocrs option ensures that the CRSD process and OCR do not start with the rest of the Oracle Clusterware stack. Ignore any errors that display.

Check whether CRSD is running by running the following command

<GI_HOME>/bin/crsctl status resource ora.crsd -init

If CRSD is running, then stop it by running the following command as root,

<GI_HOME>/bin/crsctl stop resource ora.crsd -init

Sample output

[root@rac01]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.evmd' on 'rac01'

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'

CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded

CRS-2676: Start of 'ora.evmd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'

CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac01'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'

CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac01'

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'

CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded

CRS-2676: Start of 'ora.drivers.acfs' on 'rac01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac01'

CRS-2676: Start of 'ora.asm' on 'rac01' succeeded

[root@rac01]# crsctl status resource ora.crsd

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.

[root@rac01]#

[root@rac01]# crsctl status resource ora.crsd -init

NAME=ora.crsd

TYPE=ora.crs.type

TARGET=OFFLINE

STATE=OFFLINE

* Check if ASM instance is running with command "ps -ef | grep asm_pmon"

[root@rac01]# ps -ef | grep asm_pmon | grep -v grep

grid 17207 1 0 12:36 ? 00:00:00 asm_pmon_+ASM1

ASM instance "+ASM1" is running as user "grid", the OS process id is 17207.

Log into system as "grid" and check the GI home from which ASM instance is running,

pwdx <pid>

If command pwdx does not exist on your system, use following instead,

ls -l /proc/<pid>/cwd

Sample output

[grid@rac01]$ pwdx 17207

17207: /u01/app/12.2.0/grid/dbs

[grid@rac01]$ ls -l /proc/17207/cwd

lrwxrwxrwx 1 grid oracle 0 May 16 12:45 /proc/17207/cwd -> /u01/app/12.2.0/grid/dbs

The GI home is "/u01/app/12.2.0/grid".

* Connect to instance "+ASM1" with sqlplus as sysasm

[grid@rac01]$ export ORACLE_SID='+ASM1'

[grid@rac01]$ export ORACLE_HOME=/u01/app/12.2.0/grid

[grid@rac01]$ export PATH=$ORACLE_HOME/bin:$PATH

[grid@rac01]$ sqlplus / as sysasm

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 16 13:03:53 2021

Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

* List candidate disks which can be used to create disk group in sqlplus

SQL> col path for a30

SQL> select group_number,path,state,header_status,mount_status from v$asm_disk;

GROUP_NUMBER PATH STATE HEADER_STATU MOUNT_S

------------ ------------------------------ -------- ------------ -------

0 /dev/mapper/NEWDATA1 NORMAL CANDIDATE CLOSED

0 /dev/mapper/NEWDATA2 NORMAL CANDIDATE CLOSED

If your new disks are not listed, please check and make sure the value of parameter "asm_diskstring" include the path of your new disk.

Example value of "asm_diskstring"

SQL> show parameter asm_diskstring

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

asm_diskstring string AFD:*, /dev/mapper/*

Oracle will search path /dev/mapper and AFD disks for candidate disks. Therefore, the following new disks can be listed in view v$asm_disk,

/dev/mapper/NEWDATA1

/dev/mapper/NEWDATA2

Note, if the new disk path is not configured with "asm_diskstring" and you decide to configure it manually with "alter system set asm_diskstring", make sure the value will be kept across the ASM instance restart. If original ASM instance saved the spfile in same disk group as OCR, it cannot be accessible either. Most likely, the current running ASM instances is started with default value without spfile/pfile. If this is true, it would be better for you to create a temporary pfile 'init+ASM.ora' under directory <GI_HOME>/dbs.

* Create disk group DATA

SQL> create diskgroup DATA external redundancy

2 disk '/dev/mapper/NEWDATA1' size 200g

3 disk '/dev/mapper/NEWDATA2' size 200g;

Diskgroup created.

SQL> select g.name,d.path,d. state,d.header_status,d.mount_status

2 from v$asm_diskgroup g, v$asm_disk d

3 where g.group_number(+)=d.group_number;

NAME PATH STATE HEADER_STATU MOUNT_S

---------- ------------------------------ -------- ------------ -------

DATA /dev/mapper/NEWDATA1 NORMAL MEMBER CACHED

DATA /dev/mapper/NEWDATA2 NORMAL MEMBER CACHED

I am using external redundancy because the disks are multipath SAN disks, ASM mirror is not necessary. If your disks are not RAID disks, ASM mirror,at least NORMAL REDUNDANCY, is strongly recommended.

Make sure the new created disk group is mounted,

SQL> select name,state from v$asm_diskgroup;

NAME STATE

------------------------------ -----------

DATA MOUNTED

3. Restore OCR to disk group DATA

* Restore OCR by running following command as root

<GI_HOME>/bin/ocrconfig -restore <Retrieved-OCR-backup>

Here, <Retrieved-OCR-backup> is the OCR backup file (/u01/amdu_2021_05_16_11_04_01/DATA_272.f) extracted from corrupted disk group in first step.

Example output

[root@rac01]# /u01/app/12.2.0/grid/bin/ocrconfig -restore /u01/amdu_2021_05_16_11_04_01/DATA_272.f

[root@rac01]#

You may have to repeat step 1 to extract other OCR backup file from corrupted disk group if the command returns errors like following

[root@rac01]# /u01/app/12.2.0/grid/bin/ocrconfig -restore /u01/amdu_2021_05_16_11_04_01/DATA_272.f

PROT-40: The specified file '/u01/amdu_2021_05_16_11_04_01/DATA_272.f' has an invalid format

PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]

Note: If the new disk group is named different from the corrupted one, "ocrconfig -restroe" will fail with errors like following

[root@rac01]# /u01/app/12.2.0/grid/bin/ocrconfig -restore /u01/amdu_2021_05_16_11_04_01/DATA_272.f

PROT-35: The configured OCR locations are not accessible.

PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]

As a workaround, you can manually update the OCR location information by editing file "/etc/oracle/ocr.loc" as following

Backup the file with command

cp -p /etc/oracle/ocr.loc /etc/oracle/ocr.loc.bak

Edit the file /etc/oracle/ocr.loc, and replace "ocrconfig_loc" line with following line

ocrconfig_loc=<new_disk_group>

If new disk group is named "DATANEW", Sample output

[root@rac01]# cat /etc/oracle/ocr.loc

ocrconfig_loc=+DATA/cluster01/OCRFILE/registry.255.1072473645

local_only=false

[root@rac01]# cp -p /etc/oracle/ocr.loc /etc/oracle/ocr.loc.bak

[root@rac01]# sed -i '/ocrconfig_loc=/c\ocrconfig_loc=+DATANEW' /etc/oracle/ocr.loc

[root@rac01]# cat /etc/oracle/ocr.loc

ocrconfig_loc=+DATANEW

local_only=false

After the file is changed with new disk group name, re-run "ocrconfig -restore" command.

Note: It is also needed to edit the same file on all other cluster nodes if the new disk group name is different from corrupted disk group.

* Verify the integrity of new restored OCR by running command as root

<GI_HOME>/bin/ocrcheck

Example output

[root@rac01]# /u01/app/12.2.0/grid/bin/ocrcheck

Status of Oracle Cluster Registry is as follows :

Version : 4

Total space (kbytes) : 409568

Used space (kbytes) : 11948

Available space (kbytes) : 397620

ID : 1063869578

Device/File Name : +DATA

Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

It shows that neither integrity issue nor logical corruption is found. The OCR restore succeeded.

* Stop Oracle Clusterware

<GI_HOME>/bin/crsctl stop crs -f

Example output

[root@rac01]# /u01/app/12.2.0/grid/bin/crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac01'

CRS-2673: Attempting to stop 'ora.evmd' on 'rac01'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac01'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac01'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac01'

CRS-2677: Stop of 'ora.evmd' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'rac01'

CRS-2673: Attempting to stop 'ora.asm' on 'rac01'

CRS-2677: Stop of 'ora.drivers.acfs' on 'rac01' succeeded

CRS-2677: Stop of 'ora.mdnsd' on 'rac01' succeeded

CRS-2677: Stop of 'ora.gpnpd' on 'rac01' succeeded

CRS-2677: Stop of 'ora.ctssd' on 'rac01' succeeded

CRS-2677: Stop of 'ora.asm' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'rac01'

CRS-2677: Stop of 'ora.cssd' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'rac01'

CRS-2677: Stop of 'ora.gipcd' on 'rac01' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac01' has completed

CRS-4133: Oracle High Availability Services has been stopped.

* If the OCR is restored to new disk group which name is different from corrupted disk group, edit file "/etc/oracle/ocr.loc" on all other nodes like what you did on first node.

Note: Do not start Clusterware on any nodes.

4. Recreate Voting files

* Run the following command as root from only one node to start the Oracle Clusterware stack in exclusive mode, which does not require voting files to be present or usable

<GI_HOME>/bin/crsctl start crs -excl

Example output

[root@rac01]# u01/app/12.2.0/grid/bin/crsctl start crs -excl

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.evmd' on 'rac01'

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'

CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded

CRS-2676: Start of 'ora.evmd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'

CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac01'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'

CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac01'

CRS-2676: Start of 'ora.asm' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.storage' on 'rac01'

CRS-2676: Start of 'ora.storage' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.crsd' on 'rac01'

CRS-2676: Start of 'ora.crsd' on 'rac01' succeeded

* Verify currently defined voting files

<GI_HOME>/bin/crsctl query css votedisk

Example output

[root@rac01]# /u01/app/12.2.0/grid/bin/crsctl query css votedisk

Located 0 voting disk(s).

No voting disk (file) is found because the original voting file was located in the corrupted disk group. Therefore, the Clusterware cannot be started normally until voting file is re-created.

* Recreate Voting files by running following command as root

<GI_HOME>/bin/crsctl replace votedisk <disk_group>

Example output

[root@rac01]# /u01/app/12.2.0/grid/bin/crsctl replace votedisk +DATA

Successful addition of voting disk 0a66a3368a0e4fb2bfef7de9764d3d8b.

Successfully replaced voting disk group with +DATA.

CRS-4266: Voting file(s) successfully replaced

* Re-verify currently defined voting files

[root@rac01]# /u01/app/12.2.0/grid/bin/crsctl query css votedisk

## STATE File Universal Id File Name Disk group

-- ----- ----------------- ---------------------- ---------

1. ONLINE 0a66a3368a0e4fb2bfef7de9764d3d8b (/dev/mapper/newDATA1) [DATA]

Located 1 voting disk(s).

* Stop the Clusterware stack as root

<GI_HOME>/bin/crsctl stop crs -f

Example output

[root@rac01]# /u01/app/12.2.0/grid/bin/crsctl stop has

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac01'

CRS-2673: Attempting to stop 'ora.crsd' on 'rac01'

CRS-2677: Stop of 'ora.crsd' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.storage' on 'rac01'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac01'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac01'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac01'

CRS-2677: Stop of 'ora.drivers.acfs' on 'rac01' succeeded

CRS-2677: Stop of 'ora.gpnpd' on 'rac01' succeeded

CRS-2677: Stop of 'ora.storage' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'rac01'

CRS-2677: Stop of 'ora.mdnsd' on 'rac01' succeeded

CRS-2677: Stop of 'ora.asm' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'rac01'

CRS-2673: Attempting to stop 'ora.evmd' on 'rac01'

CRS-2677: Stop of 'ora.ctssd' on 'rac01' succeeded

CRS-2677: Stop of 'ora.evmd' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'rac01'

CRS-2677: Stop of 'ora.cssd' on 'rac01' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'rac01'

CRS-2677: Stop of 'ora.gipcd' on 'rac01' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac01' has completed

CRS-4133: Oracle High Availability Services has been stopped.

* Restart the Oracle Clusterware stack in normal mode as root on all nodes

<GI_HOME>/bin/crsctl start crs

5. Create ASM spfile in new ASM disk group

If the original ASM spfile was saved in corrupted disk group, a new spfile has to be created in new disk group as following,

* Start sqlplus as GI_HOME owner (usually "grid") and connect to ASM instance as sysasm

Example output

[grid@rac01]$ ps -ef | grep asm_pmon | grep -v grep

grid 24084 1 0 15:53 ? 00:00:00 asm_pmon_+ASM1

[grid@rac01]$

[grid@rac01]$ . oraenv

ORACLE_SID = [grid] ? +ASM1

The Oracle base remains unchanged with value /u01/app/grid

[grid@rac01]$

[grid@rac01]$ sqlplus / as sysasm

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 16 15:59:59 2021

Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

* Confirm that the instance is not using spfile

SQL> show parameter spfile

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

spfile string

* Create spfile in new disk group

SQL> create spfile='+DATA' from pfile='/u01/app/12.2.0/grid/dbs/init+ASM.ora';

File created.

Note, make sure the pfile init+ASM.ora has been created with proper value for parameter "asm_diskstring".

* Restart ASM on each node of the cluster

<GI_HOME>/bin/srvctl stop asm -force

<GI_HOME>/bin/srvctl start asm

* Confirm that the instance is using spfile

SQL> show parameter spfile

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

spfile string +DATA/cluster01/ASMPARAMETERFILE/regi

stry.253.1072713895

The ASM instance is using spfile '+DATA/cluster01/ASMPARAMETERFILE/registry.253.1072713895';

6. Re-create mgmtdb as GI_HOME owner (grid)

This step is only needed when mgmtdb was configured before disk group corruption and cannot be started after the Clusterware is restored.

Note: if you have access to My Oracle Support, download mdbutil.pl from Doc ID 2065175.1 and use it to re-create mgmtdb, it is much easier than manual creation. If you cannot download the script, continue with this article to re-create mgmtdb manually.

Confirm whether mgmtdb was configured before with command,

<GI_HOME>/bin/srvctl status mgmtdb

If the command returns as following,

[grid@rac01]$ srvctl status mgmtdb

Database is enabled

Database is not running.

"Databse is enalbed" - means original Clusterware was configured with mgmtdb

"Database is not running" - for some reason, mgmtdb is not running, try to start it with command following command

<GI_HOME>/bin/srvctl start mgmtdb

If the database was not created in the corrupted disk group, the database should be able to be started. If it cannot be started, follow the instruction in this step to re-create.

* Make sure listener mgmtlsnr is not running

<GI_HOME>/bin/srvctl status mgmtlsnr

If the listener is running, stop it

<GI_HOME>/bin/srvctl stop mgmtlsnr

Example output

[grid@rac01]$ srvctl status mgmtlsnr

Listener MGMTLSNR is enabled

Listener MGMTLSNR is running on node(s): rac01

[grid@rac01]$ srvctl stop mgmtlsnr

[grid@rac01]$ srvctl status mgmtlsnr

Listener MGMTLSNR is enabled

Listener MGMTLSNR is not running

* Start Oracle Database QoS Management server resource (qosmserver) if it is enabled and not running

Check the status of qosmserver,

<GI_HOME>/bin/srvctl status qosmserver

If the service is enabled but not running, start it with command

<GI_HOME>/bin/srvctl start qosmserver

Example output

[grid@rac01]$ srvctl status qosmserver

QoS Management Server is enabled.

QoS Management Server is not running.

[grid@rac01]$ srvctl start qosmserver

[grid@rac01]$ srvctl status qosmserver

QoS Management Server is enabled.

QoS Management Server is running on node rac01.

* Remove mgmtdb resource from OCR

<GI_HOME>/bin/srvctl remove mgmtdb -force

Note: If resrouce is not removed, dbca will fail.

* Create mgmtdb with dbca as following

<GI_HOME>/bin/dbca -silent -createDatabase -createAsContainerDatabase true \

-templateName MGMTSeed_Database.dbc -sid -MGMTDB -gdbName _mgmtdb \

-storageType ASM -diskGroupName <disk_group> \

-datafileJarLocation <GI_HOME>/assistants/dbca/templates \

-characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck

Example command output

[grid@rac01]$ dbca -silent -createDatabase -createAsContainerDatabase true \

> -templateName MGMTSeed_Database.dbc -sid -MGMTDB -gdbName _mgmtdb \

> -storageType ASM -diskGroupName +DATA \

> -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates \

> -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck

Registering database with Oracle Grid Infrastructure

5% complete

Copying database files

6% complete

7% complete

22% complete

36% complete

Creating and starting Oracle instance

37% complete

39% complete

43% complete

44% complete

45% complete

49% complete

53% complete

54% complete

55% complete

58% complete

Completing Database Creation

59% complete

62% complete

65% complete

68% complete

Executing Post Configuration Actions

100% complete

Look at the log file "/u01/app/grid/cfgtoollogs/dbca/_mgmtdb/_mgmtdb4.log" for further details.

* Configure mgmtdb with command

<GI_HOME>/bin/mgmtca -local

As of now, the Clusterware is restored successfully. Next step is restore/recover databases, every DBA knows how to do that.

Pages

Sunday, May 16, 2021

Grid Infrastructure 12.2 restore OCR, Voting File and mgmtdb when disk group corrupted

1. Retrieve OCR backup from corrupted disk group

2. Create new disk group

3. Restore OCR to disk group DATA

4. Recreate Voting files

5. Create ASM spfile in new ASM disk group

6. Re-create mgmtdb as GI_HOME owner (grid)

No comments: