Monday, May 21, 2012

OCR Mirror Size and some other OCR Observations

Oracle documentation states on 11.1 for clusterware files (ocr,vote disks) for new installation partitions of 280MB is sufficient. This will fine for new installation but if the system was created with only one ocr file and later on it is decided to add an ocrmirror then that partition must be same size as the ocr, having a size of 280MB itself is not enough.
Current ocr configuration.
# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     296940
         Used space (kbytes)      :       3840
         Available space (kbytes) :     293100
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded
There's only one ocr file. Create another disk partition for ocrmirror which is larger than 280MB but less than the size of the current ocr.
Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          37      297171   83  Linux
/dev/sde1               1          36      289138+  83  Linux
Trying to add this partition as an ocrmirror will fail
./ocrconfig -replace ocrmirror /dev/sde1
PROT-22: Storage too small
on the log file created in $CRS_HOME/log/rac1/client
Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2012-05-18 13:25:21.692: [ OCRCONF][2121096064]ocrconfig starts...
2012-05-18 13:25:21.706: [  OCRCLI][2121096064]proac_replace_dev:[/dev/sde1]: Failed. Retval [31]
2012-05-18 13:25:21.706: [ OCRCONF][2121096064]The input OCR device is too small to mirror the OCR content
2012-05-18 13:25:21.706: [ OCRCONF][2121096064]Exiting [status=failed]...
Recreate a new partition of same size and set the ocr permissions
Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          37      297171   83  Linux
/dev/sde1               1          37      297171   83  Linux

chown root:oinstall /dev/sde1
chmod 640 /dev/sde1
Change into the directory $CRS_HOME/bin for some reason using the full path fails
$CRS_HOME/bin/ocrconfig -replace ocrmirror /dev/sde1
PROT-21: Invalid parameter
On the ocrconfig log file generated
# more ocrconfig_8335.log
Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2012-05-18 14:08:05.745: [ OCRCONF][1000385408]ocrconfig starts...
2012-05-18 14:08:05.753: [  OCRCLI][1000385408]proac_replace_dev:[/dev/sde1]: Failed. Retval [8]
2012-05-18 14:08:05.754: [ OCRCONF][1000385408]The input OCR device either is identical to the other device or cannot be opened
2012-05-18 14:08:05.754: [ OCRCONF][1000385408]Exiting [status=failed]...
Running from bin directory
bin]# ./ocrconfig -replace ocrmirror /dev/sde1

# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     296940
         Used space (kbytes)      :       3840
         Available space (kbytes) :     293100
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
         Logical corruption check succeeded
ocrmirror is sucessfully added.


OCR Corruption
Current ocr configuration
ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     296940
         Used space (kbytes)      :       3880
         Available space (kbytes) :     293060
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
         Logical corruption check succeeded
Corrupt the primary ocr file
# dd if=/dev/zero of=/dev/sdb1 bs=8192 count=15
15+0 records in
15+0 records out
122880 bytes (123 kB) copied, 0.000178 seconds, 690 MB/s
With a mirror ocr the system should be able to function without a problem. But for a while all ocrcheck, ocrconfig and even shutdown will fail.
# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
Cannot shutdown the cluster to do a restore
crsctl stop crs
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage
no other orcconfig commands will work either (these are not relavent to the problem at hand)
$CRS_HOME/bin/ocrconfig -overwrite
PROT-19: Cannot proceed while clusterware is running. Shutdown clusterware first

$CRS_HOME/bin/ocrconfig -repair ocr /dev/sdb1
PROT-19: Cannot proceed while clusterware is running. Shutdown clusterware first
When the ocrcheck is run an ocrcheck log is generated in $CRS_HOM/log/hostname/client which will say working ocr file which doesn't have enough votes. The file name is listed in the cluster alert log
[client(16730)]CRS-1011:OCR cannot determine that the OCR content contains the latest updates. Details in /opt/crs/oracle/product/11.1.0/crs/log/rac1/client/ocrcheck_16730.log.

more ocrcheck_16730.log
Oracle Database 11g CRS Release 11.1.0.7.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2012-05-18 14:23:10.801: [OCRCHECK][3369040768]ocrcheck starts...
2012-05-18 14:23:10.820: [  OCRRAW][3369040768]propriogid:1: INVALID FORMAT
2012-05-18 14:23:10.820: [  OCRRAW][3369040768]proprioini: disk 1 (/dev/sde1) does not have enough votes (1,2)
2012-05-18 14:23:10.821: [  OCRRAW][3369040768]proprinit: Could not open raw device
2012-05-18 14:23:10.821: [ default][3369040768]a_init:7!: Backend init unsuccessful : [26]
2012-05-18 14:23:10.821: [OCRCHECK][3369040768]Failed to access OCR repository: [PROC-26: Error while accessing the physical storage]
2012-05-18 14:23:10.821: [OCRCHECK][3369040768]Failed to initialize ocrchek2
2012-05-18 14:23:10.821: [OCRCHECK][3369040768]Exiting [status=failed]...
If it's know which disk (ocr or ocrmirror) is corrupted it could be droped to resolve the issue. If it's not known which disk (if it's the ocr or the ocrmirror) then this method is of no use.
Since it's know that ocr was the corrupted disk
./ocrconfig -replace ocr
If it's the ocrmirror then this should be
./ocrconfig -replace ocrmirror
This will generate following on the cluster alert log
2012-05-18 14:30:18.841
[crsd(3915)]CRS-1010:The OCR mirror location /dev/sdb1 was removed.
In the alert log /dev/sdb1 is refered as ocr mirror(!?).
After the corrupted disk is dropped the ocrcheck list the single ocr file in use
./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     296940
         Used space (kbytes)      :       3880
         Available space (kbytes) :     293060
         ID                       :  675013742
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded

                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded
Add the corrupted disk back again to bring the system with two ocr disks.
./ocrconfig -replace ocrmirror /dev/sdb1

# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     296940
         Used space (kbytes)      :       3880
         Available space (kbytes) :     293060
         ID                       :  675013742
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded
         Logical corruption check succeeded
Useful metalink notes
OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) [ID 428681.1]
Information On OCR And Voting Disk In Oracle 10gR2 Clusterware (RAC) [ID 1092293.1]