Sunday, February 14, 2016

Restoring OCR and Vote Disks on Block Devices in a 11gR2 Cluster (11.2.0.4)

Using block or raw devices for OCR and vote disks is not supported for new installation of 11.2. However it is supported for upgraded systems. Following is from the Oracle clusterware admin guide "Oracle Universal Installer for Oracle Clusterware 11g release 2 (11.2), does not support the use of raw or block devices. However, if you upgrade from a previous Oracle Clusterware release, then you can continue to use raw or block devices. Oracle recommends that you use Oracle ASM to store OCR and voting disks." On 12c block devices are not supported at all and ocr and vote disks must be moved to ASM before the upgrade.
However if chosen the ocr and vote disks can remain in block devices such as after upgrade from 11.1.0.7 to 11.2.0.4 (or 11.2.0.3). In 10g and 11gR1 clusters the vote disk was backed up using dd command. However this is not supported on 11.2. Following is from the clusterware admin guide "The dd commands used to back up and recover voting disks in previous versions of Oracle Clusterware are not supported in Oracle Clusterware 11g release 2 (11.2). Restoring voting disks that were copied using dd or cp commands can prevent the Oracle Clusterware 11g release 2 (11.2) stack from coming up." On 11.2 vote disks are not needed to be backed up separately. From clusterware admin "In Oracle Clusterware 11g release 2 (11.2), you no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added."
So only the OCR is needed to be backed up and this could be used to restore both ocr and vote disk. This post shows steps for restoring OCR and vote disks that are stored on block devices (after all of the ocr and vote disks have failed or corrupted). The environment used for this is a 11.2.0.4 two node cluster which was previously upgraded from 11.1.0.7.
Two OCR files are
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     292924
         Used space (kbytes)      :       6452
         Available space (kbytes) :     286472
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded

         Logical corruption check succeeded
and the vote disks
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   6cc871618b1e5f9fbf795315f36b3c21 (/dev/sdh1) []
 2. ONLINE   8f988bcfda0b5fe0bf783f44e81d3f66 (/dev/sdf1) []
 3. ONLINE   6af6320f0979dff7bf6fa474cea2765a (/dev/sdg1) []
OCR and vote disks all were corrupted with dd command
 for i in /dev/sdb1 /dev/sde1 /dev/sdh1 /dev/sdg1 /dev/sdf1
> do
> dd if=/dev/zero of=$i bs=8192 count=1000
> done
This results in OCR and vote disks becoming unusable. Below output shows vote disk state pending offline
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. PENDOFFL 6cc871618b1e5f9fbf795315f36b3c21 (/dev/sdh1) []
 2. PENDOFFL 8f988bcfda0b5fe0bf783f44e81d3f66 (/dev/sdf1) []
 3. PENDOFFL 6af6320f0979dff7bf6fa474cea2765a (/dev/sdg1) []
Following message could be observed on ocssd.log
2016-01-15 17:08:16.921: [    CSSD][3422501184]clssnmvVoteDiskValidation: Voting disk /dev/sdh1 is corrupted
2016-01-15 17:08:16.921: [    CSSD][3422501184]clssnmvWorkerThread: disk /dev/sdh1 corrupted
2016-01-15 17:08:16.921: [    CSSD][3422501184]clssnmvDiskAvailabilityChange: voting file /dev/sdh1 now offline
2016-01-15 17:08:17.405: [   SKGFD][3419347264]Lib :UFS:: closing handle 0x85937a0 for disk :/dev/sdh1:

2016-01-15 17:08:20.746: [    CSSD][3412990272]clssnmvVoteDiskValidation: Voting disk /dev/sdg1 is corrupted
2016-01-15 17:08:20.746: [    CSSD][3412990272]clssnmvWorkerThread: disk /dev/sdg1 corrupted
2016-01-15 17:08:20.746: [    CSSD][3412990272]clssnmvDiskAvailabilityChange: voting file /dev/sdg1 now offline
2016-01-15 17:08:20.901: [   SKGFD][3422501184]Lib :UFS:: closing handle 0x86afd90 for disk :/dev/sdh1:
Eventually the two nodes got rebooted after which the OCR and vote disk restore process began.
Stop clusterware on all nodes with -f option.
# crsctl stop crs -f
At times this could take a while. Quickest option was to disable crs and reboot the nodes. Once nodes starts enable crs again but do not start the cluster stack.



When clusterware stack is down on all nodes, start it only on a single node with exclusive and nocrs options.
crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
If crsd process is running stop it.
crsctl stop resource ora.crsd -init
Restore the OCR with a backup
#ocrconfig -restore /opt/app/11.2.0/grid/cdata/cg_11g_cluster/backup_20160115_170721.ocr
# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     292924
         Used space (kbytes)      :       6452
         Available space (kbytes) :     286472
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

         Logical corruption check succeeded
As seen from the ocrcheck output the ocr files were restored to the original location listed on the ocr.loc file.
At this stage the vote disks are still not listed.
crsctl query css votedisk
Located 0 voting disk(s).
To restore the vote disks run add votedisk specifying the block device locations (replace votedisk is not applicable for non-ASM locations).
# crsctl add css  votedisk /dev/sdh1 /dev/sdg1 /dev/sdf1
Now formatting voting disk: /dev/sdh1.
Now formatting voting disk: /dev/sdg1.
Now formatting voting disk: /dev/sdf1.
CRS-4603: Successful addition of voting disk /dev/sdh1.
CRS-4603: Successful addition of voting disk /dev/sdg1.
CRS-4603: Successful addition of voting disk /dev/sdf1.
[root@rac1 oracle]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   274a68a133e84f65bf3b7b7e966f3862 (/dev/sdh1) []
 2. ONLINE   b0615b1176ef4f12bfdd34c115620249 (/dev/sdg1) []
 3. ONLINE   b1eb954b7a454f32bf0c54f252776c1d (/dev/sdf1) []
Stop the clusterware stack on this node
 crsctl stop crs -f
and start the clusteware stack on all nodes
crsctl start crs -nowait
This concludes steps for restoring OCR and vote disk on a 11.2.0.4 cluster when they are stored on block devices.

Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]
Bug 12543757 : UNABLE TO MOVE VOTING DISK FROM RAW DEVICES TO NFS

Related Posts
Restoring OCR due to ASM disk failures - 1
Restoring OCR due to ASM disk failures - 2
Restoring Vote disk due to ASM disk failures - 1
Restoring Vote disk due to ASM disk failures - 2
Restoring Vote disk due to ASM disk failures - 3
Restoring OCR & Vote disk due to ASM disk failures - 1
Restoring OCR & Vote disk due to ASM disk failures - 2
Restoring OCR & Vote disk due to ASM disk failures - 3