Scenario 2.
1. Only OCR in ASM diskgroup
2. ASM diskgroup has normal redundancy with only two failuregroups
3. Both failure groups are affected
4. Vote disk is located in a separate location (as in scenario 1)
In this case it requires an OCR restoration.
1. If testing create a manual backup of the OCR with
ocrconfig -manualbackup2.Identify the disks in the disk group where OCR resides and corrupt them to simulate disk failure
SQL> select d.name,g.name from v$asm_disk d, v$asm_diskgroup g3.Try checking the OCR intergrity
where d.group_number=g.group_number and g.name='CLUSTERDG';
NAME NAME
------ ----------
CLUS1 CLUSTERDG
CLUS3 CLUSTERDG
# /etc/init.d/oracleasm querydisk -p clus1
Disk "CLUS1" is a valid ASM disk
/dev/sdc2: LABEL="CLUS1" TYPE="oracleasm"
# /etc/init.d/oracleasm querydisk -p clus3
Disk "CLUS3" is a valid ASM disk
/dev/sdc10: LABEL="CLUS3" TYPE="oracleasm"
# dd if=/dev/zero of=/dev/sdc2 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.155909 seconds, 1.1 GB/s
# dd if=/dev/zero of=/dev/sdc10 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.157382 seconds, 1.1 GB/s
ocrcheck4. To hasten the detection of OCR failure issue manual OCR backup command (mentioned step 1). This will log the OCR failure on cluster alert log
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
2010-08-13 16:29:24.591and crsd.log
[crsd(15464)]CRS-1006:The OCR location +CLUSTERDG is inaccessible. Details in /opt/app/11.2.0/grid/log/hpc1/crsd/crsd.log.
2010-08-13 16:29:34.618
[/opt/app/11.2.0/grid/bin/orarootagent.bin(15599)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) in /opt/app/11.2.0/grid/log/hpc1/agent/crsd/orarootagent_root/orarootagent_root.log.
2010-08-13 16:29:35.363
[ohasd(13310)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-13 16:29:37.385
[ohasd(13310)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-13 16:29:39.408
[ohasd(13310)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-13 16:29:41.429
[ohasd(13310)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-13 16:29:55.583
[ohasd(13310)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2010-08-13 16:29:54.798: [GIPCXCPT][1291424336] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)But the cluster database instance on that node is open for access.
2010-08-13 16:29:54.841: [ OCRRAW][1291424336]propriogid:1_2: INVALID FORMAT
2010-08-13 16:29:54.841: [ OCRRAW][1291424336]proprioini: all disks are not OCR/OLR formatted
2010-08-13 16:29:54.841: [ OCRRAW][1291424336]proprinit: Could not open raw device
2010-08-13 16:29:54.844: [ OCRAPI][1291424336]a_init:16!: Backend init unsuccessful : [26]
2010-08-13 16:29:54.844: [ CRSOCR][1291424336] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
2010-08-13 16:29:54.844: [ CRSD][1291424336][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-08-13 16:29:54.844: [ CRSD][1291424336] Done.
5. stop the crs on all the nodes with force option since normal crs stop command will fail due to OCR corruption
crsctl stop crsIf the command seems stuck at
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with error(s).
CRS-4000: Command Stop failed, or completed with errors.
crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
at CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeededor any other point check if there are any sessions (sqlplus or otherwise) on the ASM instnace. This would be visible on cluster alert log with
[client(23327)]CRS-10001:Waiting for ASM to shutdown.Stop the DB instance and ASM instance manually and crs stop command will run to completion.
[client(23333)]CRS-10001:Waiting for ASM to shutdown.
[client(23340)]CRS-10001:Waiting for ASM to shutdown.
[client(23346)]CRS-10001:Waiting for ASM to shutdown.
6. Repair the corrupted disks and make them candidate disks for ASM
/etc/init.d/oracleasm deletedisk clus17. Start the cluster in exclusive mode only on one node, as root with
Removing ASM disk "clus1": [ OK ]
# /etc/init.d/oracleasm deletedisk clus3
Removing ASM disk "clus3": [ OK ]
# /etc/init.d/oracleasm createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk: [ OK ]
# /etc/init.d/oracleasm createdisk clus3 /dev/sdc10
Marking disk "clus3" as an ASM disk: [ OK ]
crsctl start crs -exclObserve the output on crsd.log
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'hpc1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'hpc1'
CRS-2676: Start of 'ora.gipcd' on 'hpc1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'hpc1'
CRS-2676: Start of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hpc1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hpc1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'hpc1'
CRS-2681: Clean of 'ora.diskmon' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hpc1'
CRS-2676: Start of 'ora.diskmon' on 'hpc1' succeeded
CRS-2676: Start of 'ora.cssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'hpc1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'hpc1'
CRS-2676: Start of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hpc1'
CRS-2676: Start of 'ora.asm' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hpc1'
CRS-2676: Start of 'ora.crsd' on 'hpc1' succeeded
2010-08-13 16:46:29.224: [ OCRRAW][3701277264]proprioo: No OCR/OLR devices are usableand cluster alert log
2010-08-13 16:46:29.224: [ OCRASM][3701277264]proprasmcl: asmhandle is NULL
2010-08-13 16:46:29.224: [ OCRRAW][3701277264]proprinit: Could not open raw device
2010-08-13 16:46:29.224: [ OCRASM][3701277264]proprasmcl: asmhandle is NULL
2010-08-13 16:46:29.224: [ OCRAPI][3701277264]a_init:16!: Backend init unsuccessful : [26]
2010-08-13 16:46:29.225: [ CRSOCR][3701277264] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +CLUSTERDG.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +CLUSTERDG.255.4294967295
ORA-15001: disk
] [8]
2010-08-13 16:46:29.225: [ CRSD][3701277264][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-08-13 16:46:29.225: [ CRSD][3701277264] Done.
2010-08-13 16:46:28.101If the crsd starts then stop it with
[ohasd(23658)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-13 16:46:29.224
[crsd(24464)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /opt/app/11.2.0/grid/log/hpc1/crsd/crsd.log.
2010-08-13 16:46:30.121
[ohasd(23658)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-13 16:46:30.121
[ohasd(23658)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
crsctl stop resource ora.crsd -init7. Recreate the failed diskgroup
create diskgroup clusterdg normal redundancy failgroup failgp1 disk 'ORCL:CLUS1'8. Restore the OCR from backup using ocrconfig as root
failgroup failgp2 disk 'ORCL:CLUS2' ATTRIBUTE 'compatible.asm' = '11.2';
# ocrconfig -restore /opt/app/11.2.0/grid/cdata/crs_10g_cluster/backup_20100813_162434.ocrOn the cluster alert log
2010-08-13 16:54:03.6559. Run ocrcheck to verify cluster intergrity is intact
[client(24568)]CRS-1002:The OCR was restored from /opt/app/11.2.0/grid/cdata/crs_10g_cluster/backup_20100813_162434.ocr.
Status of Oracle Cluster Registry is as follows :10. Stop the cluster with force option
Version : 3
Total space (kbytes) : 148348
Used space (kbytes) : 4580
Available space (kbytes) : 143768
ID : 552644455
Device/File Name : +CLUSTERDG
Device/File integrity check succeeded
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
crsctl stop crs -fand start crs in normal mode on all the nodes
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
crsctl start crsNow the cluster is up and running. Verify with cluvfy
CRS-4123: Oracle High Availability Services has been started.
cluvfy comp ocr -n all -verboseUseful Metalink note
Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ASM Running check passed. ASM is running on all cluster nodes
Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful
Disk group for ocr location "+CLUSTERDG" available on all the nodes
Checking size of the OCR location "+CLUSTERDG" ...
Size check for OCR location "+CLUSTERDG" successful...
WARNING:
This check does not verify the integrity of the OCR contents. Execute 'ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check passed
Verification of OCR integrity was successful.
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]