A! Help: August 2010

Friday, August 27, 2010

"Cache Fusion" Behavior for Result Cache

With Oracle 11gR2 RAC result cache misses will result in result being retreived from another instances. This behavior was not there on 11gR1.

11gR1 PL/SQL Language Refrence guide's Result Caches in Oracle RAC Environment section "Cached results are stored in the system global area (SGA). In an Oracle RAC environment, each database instance has a private function result cache, available only to sessions on that instance. If a required result is missing from the private cache of the local instance, the body of the function executes to compute the result, which is then added to the local cache. The result is not retrieved from the private cache of another instance."

From 11gR2 PL/SQL Language Refrence guide's Result Caches in Oracle RAC Environment section "Cached results are stored in the system global area (SGA). In an Oracle RAC environment, each database instance manages its own local function result cache. However, the contents of the local result cache are accessible to sessions attached to other Oracle RAC instances. If a required result is missing from the result cache of the local instance, the result might be retrieved from the local cache of another instance, instead of being locally computed."

Other behaviors are same such as result cache invalidations, where invalidation is global.

A new hidden parameter has been introduced in 11gR2 to control this result cache behavior

_result_cache_global TRUE (default) boolean (type) Are results available globally across RAC?

v$result_cache_statistics view has two new statistics

Global Hit Count
Global Miss Count

(Undocumented at the time of writing)

Wednesday, August 25, 2010

Restoring OCR & Vote disk due to ASM disk failures - 1

Earlier blog posts were about restoring OCR / Vote disk when only of these resided on ASM diskgroup. In this scenario both OCR and Vote disks reside in the same ASM diskgroup.

Scenario 1.
1. Both OCR and Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. Only one failure group is affected

This scenario is pretty much similar to OCR and Vote disk failure due to one failure group posted earlier. There's no actual restoring of anything, simply repairing the failed diskgroup would suffice. Even dropping the failed disk would allow the cluster to run normally even restart without any problem.

Due to a bug (Bug 7225720) it may not be possible to delete the ASM disk using oracleasm (metalink notes 402526.1 and 1126996.1). Only way to do delete it using oracleasm is to unmount the diskgroup which is not possible since cluster depend on this diskgroup. Therefore a clusterwide shutdown and restart would be required. Unless the disk was dropped from the diskgroup first the diskgroup will not mount during the restart and since diskgroup doesn't mount cluster will not start.

1. Current clusterware file configuration

ocrcheck
Status of Oracle Cluster Registry is as follows :
        Version                  :          3
        Total space (kbytes)     :     148348
        Used space (kbytes)      :       4580
        Available space (kbytes) :     143768
        ID                       :  552644455
        Device/File Name         : +clusterdg
                                   Device/File integrity check succeeded
                                   Device/File not configured

        Cluster registry integrity check succeeded
        Logical corruption check succeeded  

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   1e1e904b27c74f6abf4256e67c274824 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   30e33d88487a4f4cbf68242bd8a06726 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   3fa62d01b5c04fbcbf7160d3014e762c (ORCL:CLUS3) [CLUSTERDG]

2. Disk failure is simulated with a corruption

dd if=/dev/zero of=/dev/sdc2 count=204800 bs=8192
204800+0 records in
204800+0 records out
1677721600 bytes (1.7 GB) copied, 1.68371 seconds, 996 MB/s

3. ocssd.log will list detection of the vote disk corruption

2010-08-25 12:39:48.949: [    CSSD][1231718720]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-25 12:39:48.949: [    CSSD][1231718720]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-25 12:39:48.971: [    CLSF][1242208576]Closing handle:0x2aaaac4b0550
2010-08-25 12:39:48.971: [   SKGFD][1242208576]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaaac4a7f40
for disk :ORCL:CLUS1:

2010-08-25 12:39:49.003: [    CSSD][1294657856]clssnmSendingThread: sending status msg to all nodes
2010-08-25 12:39:49.003: [    CSSD][1294657856]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-25 12:39:49.681: [    CLSF][1221228864]Closing handle:0x78b3360
2010-08-25 12:39:49.681: [   SKGFD][1221228864]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x78aae00 for
disk :ORCL:CLUS1:

2010-08-25 12:39:49.950: [    CLSF][1231718720]Closing handle:0x2aaab0049050
2010-08-25 12:39:49.950: [   SKGFD][1231718720]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab00d3c40
for disk :ORCL:CLUS1:

2010-08-25 12:39:51.017: [    CSSD][1137310016]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 2340 > margin 1500 cur_ms 1034186224 lastalive 1034183884
2010-08-25 12:39:53.007: [    CSSD][1137310016]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 4330 > margin 1500 cur_ms 1034188214 lastalive 1034183884
2010-08-25 12:39:53.679: [    CSSD][1221228864]clssnmvDiskOpen: Opening ORCL:CLUS1
2010-08-25 12:39:53.679: [   SKGFD][1221228864]Handle 0x2aaab00d3c40 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 12:39:53.679: [    CLSF][1221228864]Opened hdl:0x2aaab0192050 for dev:ORCL:CLUS1:
2010-08-25 12:39:53.685: [    CSSD][1221228864]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282729710
2010-08-25 12:39:53.685: [    CSSD][1221228864]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-25 12:39:53.685: [   SKGFD][1242208576]Handle 0x2aaab020e470 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 12:39:53.685: [    CLSF][1242208576]Opened hdl:0x2aaab00e13f0 for dev:ORCL:CLUS1:
2010-08-25 12:39:53.897: [   SKGFD][1231718720]Handle 0x78aae00 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 12:39:53.897: [    CLSF][1231718720]Opened hdl:0x78b3360 for dev:ORCL:CLUS1:
2010-08-25 12:39:53.906: [    CSSD][1231718720]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-25 12:39:53.906: [    CSSD][1231718720]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-25 12:39:54.003: [    CSSD][1294657856]clssnmSendingThread: sending status msg to all nodes
2010-08-25 12:39:54.003: [    CSSD][1294657856]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-25 12:39:54.687: [    CLSF][1242208576]Closing handle:0x2aaab00e13f0
2010-08-25 12:39:54.687: [   SKGFD][1242208576]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab020e470
for disk :ORCL:CLUS1:

2010-08-25 12:39:54.783: [    CSSD][1221228864]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-25 12:39:54.783: [   SKGFD][1242208576]Handle 0x2aaab020e470 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 12:39:54.783: [    CLSF][1242208576]Opened hdl:0x2aaab0219cf0 for dev:ORCL:CLUS1:

ocrcheck will show the following

ocrcheck
Status of Oracle Cluster Registry is as follows :
        Version                  :          3
        Total space (kbytes)     :     148348
        Used space (kbytes)      :       4580
        Available space (kbytes) :     143768
        ID                       :  552644455
        Device/File Name         : +clusterdg
                                   Device/File integrity check failed
                                   Device/File not configured

        Cluster registry integrity check failed
        Logical corruption check bypassed due to insufficient quorum

vote disk query shows

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   1e1e904b27c74f6abf4256e67c274824 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   30e33d88487a4f4cbf68242bd8a06726 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   3fa62d01b5c04fbcbf7160d3014e762c (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

4. Stop the cluster and try to start it on one node and start will fail. Following from the crsd.log

crsctl stop crs
crsctl start crs


2010-08-25 12:49:48.385: [  OCRASM][862130768]proprasmo: kgfoCheckMount returned [6]
2010-08-25 12:49:48.385: [  OCRASM][862130768]proprasmo: The ASM disk group clusterdg is not found or not mounted
2010-08-25 12:49:48.385: [  OCRRAW][862130768]proprioo: Failed to open [+clusterdg]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2010-08-25 12:49:48.385: [  OCRRAW][862130768]proprioo: No OCR/OLR devices are usable
2010-08-25 12:49:48.385: [  OCRASM][862130768]proprasmcl: asmhandle is NULL
2010-08-25 12:49:48.385: [  OCRRAW][862130768]proprinit: Could not open raw device
2010-08-25 12:49:48.385: [  OCRASM][862130768]proprasmcl: asmhandle is NULL
2010-08-25 12:49:48.385: [  OCRAPI][862130768]a_init:16!: Backend init unsuccessful : [26]
2010-08-25 12:49:48.386: [  CRSOCR][862130768] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +CLUSTERDG.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +CLUSTERDG.255.4294967295
ORA-15001: disk
] [8]
2010-08-25 12:49:48.386: [    CRSD][862130768][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-08-25 12:49:48.386: [    CRSD][862130768] Done.

From the cluster alert log

[ohasd(25408)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpc1'.
2010-08-25 12:49:48.385
[crsd(26171)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /opt/app/11.2.0/grid/log/hpc1/crsd/crsd.log.

5. As said earlier reason for cluster not starting is diskgroup not being mounted. Querying the ASM instance for diskgroup status gives

SQL> select name,state from v$asm_diskgroup;
NAME       STATE
---------- ---------------
CLUSTERDG  DISMOUNTED
DATA       MOUNTED
FLASH      MOUNTED

From the ASM alert log

NOTE: Diskgroup used for Voting files is:
        CLUSTERDG
Diskgroup used for OCR is:CLUSTERDG                         
..
OTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,1) to disk (ORCL:CLUS2)
NOTE: Assigning number (1,2) to disk (ORCL:CLUS3)   
..
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: start heartbeating (grp 1)
kfdp_query(CLUSTERDG): 5
kfdp_queryBg(): 5
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: Assigning number (1,0) to disk ()
kfdp_query(CLUSTERDG): 6
kfdp_queryBg(): 6
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: cache dismounting (clean) group 1/0x29A4804A (CLUSTERDG)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x29A4804A (CLUSTERDG)
NOTE: cache ending mount (fail) of group CLUSTERDG number=1 incarn=0x29a4804a
kfdp_dismount(): 7
kfdp_dismountBg(): 7
NOTE: De-assigning number (1,0) from disk ()
NOTE: De-assigning number (1,1) from disk (ORCL:CLUS2)
NOTE: De-assigning number (1,2) from disk (ORCL:CLUS3)
ERROR: diskgroup CLUSTERDG was not mounted   
..
WARNING: Disk Group CLUSTERDG containing configured OCR is not mounted
WARNING: Disk Group CLUSTERDG containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "1"
ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
Wed Aug 25 12:49:26 2010
ASM Health Checker found 1 new failures
Wed Aug 25 12:49:28 2010
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.FLASH.dg (unable to communicate with CRSD/OHASD)

6. At this stage diskgroup could be force mounted and failed disk could be force dropped and cluster could be restarted. But this would reduce the redundancy in the diskgroup. Better option would be to reapir the disk and add it to the diskgroup.

# /etc/init.d/oracleasm deletedisk clus1
Removing ASM disk "clus1":                                 [  OK  ]   
# /etc/init.d/oracleasm createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk:                       [  OK  ]

If the ASM has detected the failed disk and initiated dropping of the disk then following could be observed

SQL> select name from v$asm_disk;

NAME
----------
_DROPPED_0000_CLUSTERDG
CLUS2
CLUS3

If not drop the disk using force option and add the repaired disk

SQL> alter diskgroup clusterdg drop disk clus1;
alter diskgroup clusterdg drop disk clus1
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15084: ASM disk "CLUS1" is offline and cannot be dropped.


SQL> alter diskgroup clusterdg drop disk clus1 force;

SQL> alter diskgroup clusterdg add failgroup fail1 disk 'ORCL:CLUS1';

ASM alert log will show vote disk being added to the new disk

Wed Aug 25 12:56:17 2010
NOTE: Voting file relocation is required in diskgroup CLUSTERDG
NOTE: Attempting voting file relocation on diskgroup CLUSTERDG
NOTE: voting file allocation on grp 1 disk CLUS1

7. Query the vote disks

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  1e1e904b27c74f6abf4256e67c274824 () []
 2. ONLINE   30e33d88487a4f4cbf68242bd8a06726 (ORCL:CLUS2) [CLUSTERDG]
 3. ONLINE   3fa62d01b5c04fbcbf7160d3014e762c (ORCL:CLUS3) [CLUSTERDG]
 4. ONLINE   91bc749112ac4fc1bf93cfaa74cffbc7 (ORCL:CLUS1) [CLUSTERDG]

8. Check the ocr

ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     148348
         Used space (kbytes)      :       4580
         Available space (kbytes) :     143768
         ID                       :  552644455
         Device/File Name         : +clusterdg
                                    Device/File integrity check succeeded
                                    Device/File not configured

         Cluster registry integrity check succeeded
         Logical corruption check succeeded

9. Stop and start the cluster

crsctl stop crs (if needed using -f option)
crsctl start crs

During the testing once on disk was restored subsequent disks were also corrupted one at a time and restored. In some cases when ocrcheck was run right after the corruption following output could be observed

PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage

instead of Logical corruption check bypassed due to insufficient quorum and cluster would terminate itself. Same restore steps could applied to bring up the cluster.

Restoring Vote disk due to ASM disk failures - 3

Previous blogs showed the restoring of vote disk due to one and two ASM disk failures in the diskgroup containing vote disks. This scenario is when all the disks in the ASM diskgroup containg vote disks are affected.

Scenario 3.
1. Only Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. All failure groups are affected
4. OCR is located in a separate location (in another diskgroup or block device location - not supported by Oracle only valid during migration could be moved to after installation for testing purposes)

1. Move the vote disks to ASM diskgroup if not done already. Below is the current configuration

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9da0c9d64b9a4fcabf3d143627e7f3e8 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   380dba2e5b654f80bfe69d64c5d0c12c (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   95968700d82b4ff8bf88167a28b080a4 (ORCL:CLUS3) [CLUSTERDG]

2. Identify the disks belonging to the diskgroup and corrupt them to simulate disk failure. One disk being corrupted shown here

# /etc/init.d/oracleasm querydisk -p clus3
Disk "CLUS3" is a valid ASM disk
/dev/sdc10: LABEL="CLUS3" TYPE="oracleasm"

# dd if=/dev/zero of=/dev/sdc10 count=204800 bs=8192
204800+0 records in
204800+0 records out

3. ocssd.log will show the detection of vote disk corruption

2010-08-25 10:33:33.572: [    CSSD][1235470656]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-25 10:33:33.572: [    CSSD][1235470656]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-25 10:33:34.263: [    CLSF][1193511232]Closing handle:0x68d6110
2010-08-25 10:33:34.263: [   SKGFD][1193511232]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x68ae880 for
disk :ORCL:CLUS1:

2010-08-25 10:33:34.263: [    CSSD][1193511232]clssnmvScanCompletions: completed 4 items
2010-08-25 10:33:34.270: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS3 sched delay 1680 > margin 1500 cur_ms 1026610624 lastalive 1026608944
2010-08-25 10:33:34.270: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 1560 > margin 1500 cur_ms 1026610624 lastalive 1026609064
2010-08-25 10:33:34.270: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 1720 > margin 1500 cur_ms 1026610624 lastalive 1026608904
2010-08-25 10:33:34.572: [    CLSF][1235470656]Closing handle:0x2aaab4246560
2010-08-25 10:33:34.572: [   SKGFD][1235470656]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab423b080
for disk :ORCL:CLUS1:

2010-08-25 10:33:35.042: [    CLSF][1214490944]Closing handle:0x2aaab43557f0
2010-08-25 10:33:35.042: [   SKGFD][1214490944]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab43fd730
for disk :ORCL:CLUS1:

2010-08-25 10:33:36.268: [    CSSD][1298409792]clssnmSendingThread: sending status msg to all nodes
2010-08-25 10:33:36.268: [    CSSD][1298409792]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-25 10:33:36.272: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 3720 > margin 1500 cur_ms 1026612624 lastalive 1026608904
2010-08-25 10:33:37.044: [    CSSD][1214490944]clssnmvDiskOpen: Opening ORCL:CLUS1
2010-08-25 10:33:37.044: [   SKGFD][1214490944]Handle 0x2aaab423b080 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 10:33:37.044: [    CLSF][1214490944]Opened hdl:0x2aaab41e9170 for dev:ORCL:CLUS1:
2010-08-25 10:33:37.054: [    CSSD][1214490944]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282657936
2010-08-25 10:33:37.054: [    CSSD][1214490944]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-25 10:33:37.054: [   SKGFD][1193511232]Handle 0x682fcc0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 10:33:37.054: [    CLSF][1193511232]Opened hdl:0x6906de0 for dev:ORCL:CLUS1:
2010-08-25 10:33:37.575: [   SKGFD][1235470656]Handle 0x2aaab43fd730 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 10:33:37.575: [    CLSF][1235470656]Opened hdl:0x2aaab41ee180 for dev:ORCL:CLUS1:
2010-08-25 10:33:37.598: [    CSSD][1235470656]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-25 10:33:37.598: [    CSSD][1235470656]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-25 10:33:38.055: [    CLSF][1193511232]Closing handle:0x6906de0
2010-08-25 10:33:38.055: [   SKGFD][1193511232]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x682fcc0 for
disk :ORCL:CLUS1:

2010-08-25 10:33:38.279: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 5730 > margin 1500 cur_ms 1026614634 lastalive 1026608904
2010-08-25 10:33:38.601: [    CLSF][1235470656]Closing handle:0x2aaab41ee180
2010-08-25 10:33:38.601: [   SKGFD][1235470656]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab43fd730
for disk :ORCL:CLUS1:

2010-08-25 10:33:38.658: [    CSSD][1214490944]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-25 10:33:38.659: [   SKGFD][1193511232]Handle 0x682fcc0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 10:33:38.659: [    CLSF][1193511232]Opened hdl:0x68b98f0 for dev:ORCL:CLUS1:
2010-08-25 10:33:39.600: [   SKGFD][1235470656]Handle 0x2aaab43fd730 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-25 10:33:39.600: [    CLSF][1235470656]Opened hdl:0x2aaab4276240 for dev:ORCL:CLUS1:
2010-08-25 10:33:39.689: [    CSSD][1172531520]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS2)
2010-08-25 10:33:39.689: [    CSSD][1172531520]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now offline
2010-08-25 10:33:39.830: [    CLSF][1162041664]Closing handle:0x68a4730
2010-08-25 10:33:39.830: [   SKGFD][1162041664]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x68a4790 for
disk :ORCL:CLUS2:

2010-08-25 10:33:40.271: [    CLSF][1183021376]Closing handle:0x2aaab43fdd20
2010-08-25 10:33:40.271: [   SKGFD][1183021376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab43fce20
for disk :ORCL:CLUS2:

2010-08-25 10:33:40.690: [    CLSF][1172531520]Closing handle:0x689e5d0
2010-08-25 10:33:40.690: [   SKGFD][1172531520]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x6c43870 for
disk :ORCL:CLUS2:

2010-08-25 10:33:41.278: [    CSSD][1298409792]clssnmSendingThread: sending status msg to all nodes
2010-08-25 10:33:41.278: [    CSSD][1298409792]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-25 10:33:41.281: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 2450 > margin 1500 cur_ms 1026617634 lastalive 1026615184
2010-08-25 10:33:43.289: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 4460 > margin 1500 cur_ms 1026619644 lastalive 1026615184
2010-08-25 10:33:43.832: [    CSSD][1162041664]clssnmvDiskOpen: Opening ORCL:CLUS2
2010-08-25 10:33:43.832: [   SKGFD][1162041664]Handle 0x6c43870 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-25 10:33:43.832: [    CLSF][1162041664]Opened hdl:0x68f9bc0 for dev:ORCL:CLUS2:
2010-08-25 10:33:43.876: [    CSSD][1162041664]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282657936
2010-08-25 10:33:43.876: [    CSSD][1162041664]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now online
2010-08-25 10:33:43.876: [   SKGFD][1183021376]Handle 0x2aaab43fce20 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-25 10:33:43.876: [    CLSF][1183021376]Opened hdl:0x2aaab41c1fc0 for dev:ORCL:CLUS2:
2010-08-25 10:33:44.698: [   SKGFD][1172531520]Handle 0x68a4790 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-25 10:33:44.698: [    CLSF][1172531520]Opened hdl:0x6c57440 for dev:ORCL:CLUS2:
2010-08-25 10:33:44.707: [    CSSD][1172531520]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS2)
2010-08-25 10:33:44.707: [    CSSD][1172531520]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now offline
2010-08-25 10:33:44.820: [    CSSD][1162041664]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now online
2010-08-25 10:33:45.289: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 1750 > margin 1500 cur_ms 1026621644 lastalive 1026619894
2010-08-25 10:33:50.296: [    CSSD][1298409792]clssnmSendingThread: sent 4 status msgs to all nodes
2010-08-25 10:33:53.311: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 1660 > margin 1500 cur_ms 1026629664 lastalive 1026628004
2010-08-25 10:34:13.342: [    CSSD][1298409792]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-25 10:34:14.984: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 1560 > margin 1500 cur_ms 1026651334 lastalive 1026649774
2010-08-25 10:34:14.984: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 1580 > margin 1500 cur_ms 1026651334 lastalive 1026649754
2010-08-25 10:34:16.398: [    CSSD][1224980800]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS3)
2010-08-25 10:34:16.398: [    CSSD][1224980800]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS3 now offline
2010-08-25 10:34:16.830: [    CLSF][1204001088]Closing handle:0x68a5220
2010-08-25 10:34:16.830: [   SKGFD][1204001088]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x647b4f0 for
disk :ORCL:CLUS3:

2010-08-25 10:34:17.074: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 1700 > margin 1500 cur_ms 1026653424 lastalive 1026651724
2010-08-25 10:34:17.346: [    CLSF][1245960512]Closing handle:0x68f7160
2010-08-25 10:34:17.346: [   SKGFD][1245960512]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x6d04e50 for
disk :ORCL:CLUS3:

2010-08-25 10:34:17.350: [    CSSD][1298409792]clssnmSendingThread: sending status msg to all nodes
2010-08-25 10:34:17.350: [    CSSD][1298409792]clssnmSendingThread: sent 4 status msgs to all nodes
2010-08-25 10:34:17.395: [    CLSF][1224980800]Closing handle:0x2aaab019fce0
2010-08-25 10:34:17.395: [   SKGFD][1224980800]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0115350
for disk :ORCL:CLUS3:

2010-08-25 10:34:19.164: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS3 sched delay 3340 > margin 1500 cur_ms 1026655514 lastalive 1026652174
2010-08-25 10:34:19.164: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 2070 > margin 1500 cur_ms 1026655514 lastalive 1026653444
2010-08-25 10:34:20.828: [    CSSD][1204001088]clssnmvDiskOpen: Opening ORCL:CLUS3
2010-08-25 10:34:20.828: [   SKGFD][1204001088]Handle 0x647b4f0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-25 10:34:20.828: [    CLSF][1204001088]Opened hdl:0x682fc30 for dev:ORCL:CLUS3:
2010-08-25 10:34:20.848: [    CSSD][1204001088]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282657936
2010-08-25 10:34:20.848: [    CSSD][1204001088]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS3 now online
2010-08-25 10:34:20.848: [   SKGFD][1245960512]Handle 0x6c56ac0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-25 10:34:20.848: [    CLSF][1245960512]Opened hdl:0x640ea10 for dev:ORCL:CLUS3:
2010-08-25 10:34:21.254: [    CSSD][1141061952]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS3 sched delay 5430 > margin 1500 cur_ms 1026657604 lastalive 1026652174
2010-08-25 10:34:21.254: [   SKGFD][1224980800]Handle 0x2aaab02def70 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-25 10:34:21.254: [    CLSF][1224980800]Opened hdl:0x2aaab01a40e0 for dev:ORCL:CLUS3:
2010-08-25 10:34:21.284: [    CSSD][1224980800]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS3)
2010-08-25 10:34:21.284: [    CSSD][1224980800]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS3 now offline
2010-08-25 10:34:21.358: [    CSSD][1298409792]clssnmSendingThread: sending status msg to all nodes
2010-08-25 10:34:21.358: [    CSSD][1298409792]clssnmSendingThread: sent 4 status msgs to all nodes
2010-08-25 10:34:21.849: [    CLSF][1245960512]Closing handle:0x640ea10
2010-08-25 10:34:21.849: [   SKGFD][1245960512]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x6c56ac0 for
disk :ORCL:CLUS3:

2010-08-25 10:34:22.286: [    CLSF][1224980800]Closing handle:0x2aaab01a40e0
2010-08-25 10:34:22.286: [   SKGFD][1224980800]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab02def70
for disk :ORCL:CLUS3:

2010-08-25 10:34:22.313: [    CSSD][1204001088]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS3 now online
2010-08-25 10:34:22.313: [   SKGFD][1245960512]Handle 0x6c56ac0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-25 10:34:22.313: [    CLSF][1245960512]Opened hdl:0x641fd10 for dev:ORCL:CLUS3:
2010-08-25 10:34:23.285: [   SKGFD][1224980800]Handle 0x2aaab02def70 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-25 10:34:23.285: [    CLSF][1224980800]Opened hdl:0x2aaab019fce0 for dev:ORCL:CLUS3:
2010-08-25 10:34:25.366: [    CSSD][1298409792]clssnmSendingThread: sending status msg to all nodes
2010-08-25 10:34:25.366: [    CSSD][1298409792]clssnmSendingThread: sent 4 status msgs to all nodes

Even with all three vote disks gone cluster will be running and vote disks will show online status

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9da0c9d64b9a4fcabf3d143627e7f3e8 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   380dba2e5b654f80bfe69d64c5d0c12c (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   95968700d82b4ff8bf88167a28b080a4 (ORCL:CLUS3) [CLUSTERDG]

4. Trying to stop and restart the cluster will result in a failure. Following message could be seen on occsd.log

2010-08-25 10:39:44.388: [    CSSD][1155574080]clssnmvDiskVerify: Successful discovery of 0 disks
2010-08-25 10:39:44.388: [    CSSD][1155574080]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-08-25 10:39:44.388: [    CSSD][1155574080]clssnmvFindInitialConfigs: No voting files found
2010-08-25 10:39:44.388: [    CSSD][1155574080]###################################
2010-08-25 10:39:44.389: [    CSSD][1155574080]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2010-08-25 10:39:44.389: [    CSSD][1155574080]###################################
2010-08-25 10:39:44.389: [    CSSD][1155574080]

To restore the vote disks stop crs on all nodes and start crs in exclusive mode on one node. Querying vote disks will show zero vote disks present.

crsctl stop crs
crsctl start crs -excl

crsctl query css votedisk
Located 0 voting disk(s).

5. Repair the ASM disks

# /etc/init.d/oracleasm  deletedisk CLUS1
Removing ASM disk "CLUS1":                                 [  OK  ]
# /etc/init.d/oracleasm  deletedisk CLUS2
Removing ASM disk "CLUS2":                                 [  OK  ]
# /etc/init.d/oracleasm  deletedisk CLUS3
Removing ASM disk "CLUS3":                                 [  OK  ]

# /etc/init.d/oracleasm  createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk:                       [  OK  ]
# /etc/init.d/oracleasm  createdisk clus2 /dev/sdc3
Marking disk "clus2" as an ASM disk:                       [  OK  ]
# /etc/init.d/oracleasm  createdisk clus3 /dev/sdc10
Marking disk "clus3" as an ASM disk:                       [  OK  ]

6. Start the ASM instance, no diskgroup will be mounted and diskgroup used for storing vote disks will disappear from the diskgroup list.

sqlplus  / as sysasm

SQL> startup mount
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2212656 bytes
Variable Size             256552144 bytes
ASM Cache                  25165824 bytes
ORA-15110: no diskgroups mounted   

SQL> select name from v$asm_diskgroup;

NAME
-----
DATA
FLASH

7. Create a new diskgroup to store the vote disks. At the end of the command new diskgroup will be mounted.

SQL> create diskgroup clusterdg failgroup fail1 disk 'ORCL:CLUS1' failgroup fail2 disk 'ORCL:CLUS2' 
failgroup fail3 disk 'ORCL:CLUS3' attribute 'compatible.asm'='11.2';

SQL> select name,state from v$asm_diskgroup;

NAME       STATE
---------- ----------
DATA       DISMOUNTED
FLASH      DISMOUNTED
CLUSTERDG  MOUNTED

Fro ASM alert log

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup CLUSTERDG was mounted
SUCCESS: create diskgroup clusterdg failgroup fail1 disk 'ORCL:CLUS1' failgroup fail2 disk 'ORCL:CLUS2' failgroup fail3 disk 'ORCL:CLUS3' attribute 'compatible.asm'='11.2'
Wed Aug 25 10:44:55 2010
WARNING: failed to online diskgroup resource ora.CLUSTERDG.dg (unable to communicate with CRSD/OHASD)

8. Restore the vote disks to the newly created diskgroup

crsctl replace votedisk +clusterdg
Successful addition of voting disk 1e1e904b27c74f6abf4256e67c274824
Successful addition of voting disk 30e33d88487a4f4cbf68242bd8a06726
Successful addition of voting disk 3fa62d01b5c04fbcbf7160d3014e762c
Successfully replaced voting disk group with +clusterdg.
CRS-4266: Voting file(s) successfully replaced 

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   1e1e904b27c74f6abf4256e67c274824 (ORCL:CLUS1) [CLUSTERDG]
 2. ONLINE   30e33d88487a4f4cbf68242bd8a06726 (ORCL:CLUS2) [CLUSTERDG]
 3. ONLINE   3fa62d01b5c04fbcbf7160d3014e762c (ORCL:CLUS3) [CLUSTERDG]

From ASM alert log

Wed Aug 25 10:46:31 2010
NOTE: updated gpnp profile ASM diskstring: ORCL:*
Wed Aug 25 10:46:31 2010
NOTE: Creating voting files in diskgroup CLUSTERDG
Wed Aug 25 10:46:31 2010
NOTE: Voting File refresh pending for group 1/0xc488f8d5 (CLUSTERDG)
NOTE: Attempting voting file creation in diskgroup CLUSTERDG
NOTE: voting file allocation on grp 1 disk CLUS1
NOTE: voting file allocation on grp 1 disk CLUS2
NOTE: voting file allocation on grp 1 disk CLUS3

9. Stop the crs on the node stared in exclusive mode and start the crs on all nodes

crsctl stop crs
crsctl start crs

This shows that as long as a valid OCR is available even if all vote disks are corrupted they could be restored since vote disks are backed up as part of OCR.

"The dd commands used to back up and recover voting disks in previous versions of Oracle Clusterware are not supported in Oracle Clusterware 11g release 2 (11.2).

Voting disk management requires a valid and working OCR. Before you add, delete, replace, or restore voting disks, run the ocrcheck command as root. If OCR is not available or it is corrupt, then you must restore OCR.

The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added.

If all of the voting disks are corrupted, then Restore OCR. This step is necessary only if OCR is also corrupted or otherwise unavailable, such as if OCR is on Oracle ASM and the disk group is no longer available"(11gR2 Clusterware Admin Guide)

Whereas in 10g if vote disks are corrupted and no backup available then crs had to be reinstalled.

Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

Tuesday, August 24, 2010

Restoring Vote disk due to ASM disk failures - 2

Previous blog was about failure of one failure group in the ASM diskgroup where the vote disks resides. This blog is about two failure groups being affected.

Scenario 2.
1. Only Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. Only two failure groups are affected
4. OCR is located in a separate location (in another diskgroup or block device location - not supported by Oracle only valid during migration could be moved to after installation for testing purposes)

1. It is assumed thatvote disks are already in ASM diskgroup, if not move to ASM diskgroup. Current vote disk configuration is as follows

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   8ac9c8c3ed694f3cbf36ded6f9587a24 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   7610f9e2fc134fdabf97e758e7f017d2 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   7a1a2e08ea384f20bf8be3a54e389622 (ORCL:CLUS3) [CLUSTERDG]

2. Identify two disks in the diskgroup where vote disks resides and corrupt them to simulate disk failure

# /etc/init.d/oracleasm querydisk -p clus1
Disk "CLUS1" is a valid ASM disk
/dev/sdc2: LABEL="CLUS1" TYPE="oracleasm"
# /etc/init.d/oracleasm querydisk -p clus2
Disk "CLUS2" is a valid ASM disk
/dev/sdc3: LABEL="CLUS2" TYPE="oracleasm"          

dd if=/dev/zero of=/dev/sdc2 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.158475 seconds, 1.1 GB/s

dd if=/dev/zero of=/dev/sdc3 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.160278 seconds, 1.0 GB/s

3. ocssd.log will have the following entries showing the detection of vote disk corruption for both disks

2010-08-24 13:13:26.362: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:13:26.362: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-24 13:13:30.640: [    CSSD][1418426688]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-24 13:13:30.640: [    CSSD][1418426688]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-24 13:13:31.052: [    CLSF][1397446976]Closing handle:0x2aaab00f6b90
2010-08-24 13:13:31.052: [   SKGFD][1397446976]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0064e90
for disk :ORCL:CLUS1:

2010-08-24 13:13:31.372: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:13:31.372: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-24 13:13:31.380: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS3 sched delay 1560 > margin 1500 cur_ms 949819404 lastalive 949817844
2010-08-24 13:13:31.478: [    CLSF][1365977408]Closing handle:0x13bf5d0
2010-08-24 13:13:31.478: [   SKGFD][1365977408]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x1943e00 for
disk :ORCL:CLUS1:

2010-08-24 13:13:31.478: [    CSSD][1365977408]clssnmvScanCompletions: completed 6 items
2010-08-24 13:13:31.642: [    CLSF][1418426688]Closing handle:0x2aaab830a8f0
2010-08-24 13:13:31.642: [   SKGFD][1418426688]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab805ac60
for disk :ORCL:CLUS1:

2010-08-24 13:13:32.932: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 2880 > margin 1500 cur_ms 949820954 lastalive 949818074
2010-08-24 13:13:34.075: [    CSSD][1397446976]clssnmvDiskOpen: Opening ORCL:CLUS1
2010-08-24 13:13:34.075: [   SKGFD][1397446976]Handle 0x2aaab0064e90 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-24 13:13:34.075: [    CLSF][1397446976]Opened hdl:0x2aaab0002110 for dev:ORCL:CLUS1:
2010-08-24 13:13:34.093: [    CSSD][1397446976]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282573835
2010-08-24 13:13:34.093: [    CSSD][1397446976]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-24 13:13:34.093: [   SKGFD][1365977408]Handle 0x18ee190 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-24 13:13:34.093: [    CLSF][1365977408]Opened hdl:0x1835260 for dev:ORCL:CLUS1:
2010-08-24 13:13:34.655: [   SKGFD][1418426688]Handle 0x2aaab805ac60 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-24 13:13:34.655: [    CLSF][1418426688]Opened hdl:0x2aaab8085e90 for dev:ORCL:CLUS1:
2010-08-24 13:13:34.666: [    CSSD][1418426688]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-24 13:13:34.666: [    CSSD][1418426688]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-24 13:13:34.867: [    CSSD][1397446976]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online 
2010-08-24 13:13:36.372: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:13:36.372: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
...
2010-08-24 13:14:25.608: [    CSSD][1135200576]clssgmRegisterShared: global grock DBCLUSDB member 0 share type 1, refcount 13
2010-08-24 13:14:25.609: [    CSSD][1135200576]clssgmExecuteClientRequest: Node name request from client ((nil))
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssscSelect: cookie accept request 0x2aaaac0faea0
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssscevtypSHRCON: getting client with cmproc 0x2aaaac0faea0
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmRegisterClient: proc(33/0x2aaaac0faea0), client(2/0x2aaaac105330)
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmExecuteClientRequest: GRPSHREG recvd from client 2 (0x2aaaac105330)
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmRegisterShared: grp DG_LOCAL_DATA, mbr 0, type 1
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmQueueShare: (0x2aaaac183a90) target local grock DG_LOCAL_DATA member 0 type
1 queued from client (0x2aaaac105330), local grock DG_LOCAL_DATA, refcount 11
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmRegisterShared: local grock DG_LOCAL_DATA member 0 share type 1, refcount 112010-08-24 13:14:30.139: [    CSSD][1166670144]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS2)
2010-08-24 13:14:30.139: [    CSSD][1166670144]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now offline
2010-08-24 13:14:30.190: [    CLSF][1156180288]Closing handle:0x2aaab0009a20
2010-08-24 13:14:30.190: [   SKGFD][1156180288]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab02179d0
for disk :ORCL:CLUS2:
2010-08-24 13:14:30.380: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:14:30.380: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-24 13:14:30.594: [    CLSF][1187649856]Closing handle:0x2aaab809bae0
2010-08-24 13:14:30.594: [   SKGFD][1187649856]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab80e6340
for disk :ORCL:CLUS2:

2010-08-24 13:14:31.140: [    CLSF][1166670144]Closing handle:0x2aaaac19fac0
2010-08-24 13:14:31.140: [   SKGFD][1166670144]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaaac188e90
for disk :ORCL:CLUS2:

2010-08-24 13:14:31.299: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 2110 > margin 1500 cur_ms 949879314 lastalive 949877204
2010-08-24 13:14:33.202: [    CSSD][1156180288]clssnmvDiskOpen: Opening ORCL:CLUS2
2010-08-24 13:14:33.202: [   SKGFD][1156180288]Handle 0x2aaab02179d0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-24 13:14:33.202: [    CLSF][1156180288]Opened hdl:0x2aaab0101b50 for dev:ORCL:CLUS2:
2010-08-24 13:14:33.236: [    CSSD][1156180288]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282573835
2010-08-24 13:14:33.236: [    CSSD][1156180288]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now online
2010-08-24 13:14:33.236: [   SKGFD][1187649856]Handle 0x2aaab82018a0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-24 13:14:33.236: [    CLSF][1187649856]Opened hdl:0x2aaab8202320 for dev:ORCL:CLUS2:
2010-08-24 13:14:33.379: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 4190 > margin 1500 cur_ms 949881394 lastalive 949877204
2010-08-24 13:14:34.081: [   SKGFD][1166670144]Handle 0x2aaaac188e90 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-24 13:14:34.081: [    CLSF][1166670144]Opened hdl:0x2aaaac1aada0 for dev:ORCL:CLUS2:
2010-08-24 13:14:34.113: [    CSSD][1166670144]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS2)
2010-08-24 13:14:34.113: [    CSSD][1166670144]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now offline
2010-08-24 13:14:34.113: [    CSSD][1156180288]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now online
2010-08-24 13:14:35.380: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:14:35.380: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes

4. Stop and try to start the crs, which will fail on start. ocssd.log will have the list the reason

crsctl stop crs
crsctl start crs

2010-08-24 13:18:29.130: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.130: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.130: [    CSSD][1149827392]clssnmvDiskVerify: discovered a potential voting file
2010-08-24 13:18:29.130: [   SKGFD][1149827392]Handle 0x17870510 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-24 13:18:29.130: [    CLSF][1149827392]Opened hdl:0x1780db80 for dev:ORCL:CLUS3:
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: Successful discovery for disk ORCL:CLUS3, UID 7a1a2e08-ea384f20-bf8be3a5-4e389622, Pending CIN 0:1282650882:0, Committed CIN 0:1282650882:0
2010-08-24 13:18:29.140: [    CLSF][1149827392]Closing handle:0x1780db80
2010-08-24 13:18:29.140: [   SKGFD][1149827392]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x17870510 for
disk :ORCL:CLUS3:

2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: Successful discovery of 1 disks
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmCompleteVFDiscovery: Completing voting file discovery
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskStateChange: state from discovered to pending disk ORCL:CLUS3
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskStateChange: state from pending to configured disk ORCL:CLUS3
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvVerifyCommittedConfigVFs: Insufficient voting files found, found 1 of 3 configured, needed 2 voting files
2010-08-24 13:18:29.140: [    CSSD][1149827392](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 0, id 8ac9c8c3-ed694f3c-bf36ded6-f9587a24 not found
2010-08-24 13:18:29.140: [    CSSD][1149827392](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 1, id 7610f9e2-fc134fda-bf97e758-e7f017d2 not found
2010-08-24 13:18:29.140: [    CSSD][1149827392]ASSERT clssnm1.c 2829
2010-08-24 13:18:29.140: [    CSSD][1149827392](:CSSNM00021:)clssnmCompleteVFDiscovery: Found 1 voting files, but 2 are required.
Terminating due to insufficient configured voting files
2010-08-24 13:18:29.140: [    CSSD][1149827392]###################################
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssscExit: CSSD aborting from thread clssnmvDDiscThread
2010-08-24 13:18:29.140: [    CSSD][1149827392]###################################
2010-08-24 13:18:29.141: [    CSSD][1149827392]

Cluster alert log will also have information on the missing vote disks

2010-08-24 13:18:29.140
[cssd(13590)]CRS-1637:Unable to locate configured voting file with ID 8ac9c8c3-ed694f3c-bf36ded6-f9587a24; details at (:CSSNM00020:) in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log
2010-08-24 13:18:29.140
[cssd(13590)]CRS-1637:Unable to locate configured voting file with ID 7610f9e2-fc134fda-bf97e758-e7f017d2; details at (:CSSNM00020:) in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log
2010-08-24 13:18:29.140
[cssd(13590)]CRS-1705:Found 1 configured voting files but 2 voting files are required, terminating to ensure data integrity; details at (:CSSNM00021:) in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log

Checking crs status will show everything online

crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

5. Stop the crs on all nodes if required use -f option (force) and start crs on one node in exclusive mode which doesn't require the use of vote disk.

crsctl stop crs -f
crsctl start crs -excl

6. Once started query the vote disks

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. OFFLINE  8ac9c8c3ed694f3cbf36ded6f9587a24 () []
2. OFFLINE  7610f9e2fc134fdabf97e758e7f017d2 () []
3. ONLINE   7a1a2e08ea384f20bf8be3a54e389622 (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

7. There are several possibilities to restore the vote disks from this point onwards. First shown is adding a new diskgroup and moving the vote disks to the new disk group. It is also possible to repair the exisiting disk and reuse them, which is shown later on. At this stage ASM instance is not mounted, to create a diskgroup mount the ASM instance.

sqlplus  / as sysasm

SQL> startup mount
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2212656 bytes
Variable Size             256552144 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"CLUSTERDG"

From ASM alert log

Tue Aug 24 13:28:03 2010
SQL> ALTER DISKGROUP ALL MOUNT
NOTE: Diskgroup used for Voting files is:
       CLUSTERDG
NOTE: cache registered group CLUSTERDG number=1 incarn=0xd5443b06
NOTE: cache began mount (first) of group CLUSTERDG number=1 incarn=0xd5443b06
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,2) to disk (ORCL:CLUS3)
ERROR: no PST quorum in group: required 2, found 1
NOTE: cache dismounting (clean) group 1/0xD5443B06 (CLUSTERDG)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xD5443B06 (CLUSTERDG)
NOTE: cache ending mount (fail) of group CLUSTERDG number=1 incarn=0xd5443b06
kfdp_dismount(): 2
kfdp_dismountBg(): 2
NOTE: De-assigning number (1,2) from disk (ORCL:CLUS3)
ERROR: diskgroup CLUSTERDG was not mounted
NOTE: cache deleting context for group CLUSTERDG 1/-716948730
WARNING: Disk Group CLUSTERDG containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "CLUSTERDG"
ERROR: ALTER DISKGROUP ALL MOUNT

Since two failure groups were affected the diskgroup holding the vote disks won't be mounted as well as any other disk groups. They must be explicitly mounted. But since ASM instnace is up a new diskgroup could be created.

8. Create a new diskgroup and move the vote disks.

create diskgroup votedg quorum failgroup fail1 disk 'ORCL:RED1' failgroup fail2 disk 'ORCL:RED2' failgroup fail3 disk 'ORCL:RED3' attribute 'compatible.asm'='11.2';

crsctl replace votedisk +votedg
Successful addition of voting disk a24f198797e64f78bff42b8721b964d2
Successful addition of voting disk 9d33c71328c74f2fbf86d3e5a078b648
Successful addition of voting disk 88b2c9201e9e4f4ebf0732813b5be8f1

Successful deletion of voting disk 8ac9c8c3ed694f3cbf36ded6f9587a24.
Successful deletion of voting disk 7610f9e2fc134fdabf97e758e7f017d2.
Successful deletion of voting disk 7a1a2e08ea384f20bf8be3a54e389622.
Successfully replaced voting disk group with +votedg.
CRS-4266: Voting file(s) successfully replaced      

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   a24f198797e64f78bff42b8721b964d2 (ORCL:RED1) [VOTEDG]
2. ONLINE   9d33c71328c74f2fbf86d3e5a078b648 (ORCL:RED2) [VOTEDG]
3. ONLINE   88b2c9201e9e4f4ebf0732813b5be8f1 (ORCL:RED3) [VOTEDG]
Located 3 voting disk(s).

"A quorum failure group is a special type of failure group and disks in these failure groups do not contain user data and are not considered when determining redundancy requirements" (Storage Admin Guide).

9. Stop crs on the node started in exclusive mode and start the crs on all nodes as normal

crsctl stop crs
crsctl start crs

10. The old diskgroup name will still be there in the ASM instance and may need some clean up activities. Since a normal redundancy diskgroup can only tolerate failure of one failure group, the diskgroup in question won't mount even with force option.

"Oracle ASM provides a MOUNT FORCE option with ALTER DISKGROUP to enable Oracle ASM disk groups to be mounted in normal or high redundancy modes even though some Oracle ASM disks may be unavailable to the disk group at mount time.

The default behavior without the FORCE option is to fail to mount a disk group that has damaged or missing disks.

The MOUNT FORCE option is useful in situations where a disk is temporarily unavailable and you want to mount the disk group with reduced redundancy while you correct the situation that caused the outage.

To successfully mount with the MOUNT FORCE option, Oracle ASM must be able to find at least one copy of the extents for all of the files in the disk group. In this case, Oracle ASM can successfully mount the disk group, but with potentially reduced redundancy.

The MOUNT FORCE option is useful in situations where a disk is temporarily unavailable and you want to mount the disk group with reduced redundancy while you correct the situation that caused the outage.

In clustered ASM environments, if an ASM instance is not the first instance to mount the disk group, then using the MOUNT FORCE statement fails. This is because the disks have been accessed by another instance and the disks are not locally accessible.

The FORCE option corrects configuration errors, such as incorrect values for ASM_DISKSTRING, without incurring unnecessary rebalance operations. Disk groups mounted with the FORCE option have one or more disks offline if the disks were not available at the time of the mount. You must take corrective action to restore those devices before the time set with the DISK_REPAIR_TIME value expires. Failing to restore and put those disks back online within the disk repair time frame results in Oracle ASM automatically removing the disks from the disk group".(Storage Admin Guide)

SQL> alter diskgroup clusterdg mount force;
alter diskgroup clusterdg mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"CLUSTERDG"

Solution is to assign the surviving disk forcefuly to a new diskgroup and drop it later."Caution: Use extreme care when using the FORCE option because the Oracle ASM instance does not verify whether the disk group is used by any other Oracle ASM instance before Oracle ASM deletes the disk group" (Storage Admin Guide)

SQL> create diskgroup dummydg disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' FORCE;
SQL> DROP diskgroup dummydg;

Delete the old diskgroup information from the cluster

crsctl stat res -p | grep dg
NAME=ora.CLUSTERDG.dg
NAME=ora.DATA.dg
NAME=ora.FLASH.dg
NAME=ora.VOTEDG.dg              

crsctl delete resource ora.CLUSTERDG.dg

Intead of step 8 above where a new diskgroup was created, failed disks could be repaired and used again to store the vote disks. Two alternative options are listed below.

Option 1

8. (Step 1 to 7 same as above). Repair the failed disks

# /etc/init.d/oracleasm deletedisk clus2
Removing ASM disk "clus2":                                 [  OK  ]
# /etc/init.d/oracleasm deletedisk clus1
Removing ASM disk "clus1":                                 [  OK  ]
# /etc/init.d/oracleasm createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk:                       [  OK  ]
# /etc/init.d/oracleasm createdisk clus2 /dev/sdc3
Marking disk "clus2" as an ASM disk:                       [  OK  ]

9. Querying vote disk will show

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  4d2e63d262504f4bbff82fdbf6a24869 () []
 2. OFFLINE  b161fa684ad04fa9bfd259c8cfaa2288 () []
 3. ONLINE   6d7c8162acae4f65bffb635dd7892b6a (ORCL:CLUS3) [CLUSTERDG]

10. Start the ASM instance as mentioned earlier and create a diskgroup

create diskgroup clusterdgbk disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' force attribute 'compatible.asm'='11.2';

When the statement finishes diskgroup would be in mounted state."Caution: Use extreme care when using the FORCE option because the Oracle ASM instance does not verify whether the disk group is used by any other Oracle ASM instance before Oracle ASM deletes the disk group"(Storage Admin Guide).

11. Replace the vote disks with the new diskgroup

crsctl replace votedisk +clusterdgbk
CRS-4602: Failed 3 to add voting file 8b63eece18624f20bfdeb8c91a8e142d
CRS-4602: Failed 3 to add voting file 18ff8b664acb4f2bbfe18cf887b05ae7.
CRS-4602: Failed 3 to add voting file 7304aee1fd704f48bf646a27f1f8887f

Failure 3 with Cluster Synchronization Services while deleting voting disk.
Failure 3 with Cluster Synchronization Services while deleting voting disk.
Failure 3 with Cluster Synchronization Services while deleting voting disk.
Failed to replace voting disk group with +clusterdgbk.
CRS-4000: Command Replace failed, or completed with errors.

Output shows everything has failed and end of the command crs would be aborted.

12. Stop any running cluster processes and start the crs in exclusive mode and query the votedisks

crsctl start crs -excl
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   8b63eece18624f20bfdeb8c91a8e142d (ORCL:CLUS1) [CLUSTERDGBK]
 2. ONLINE   18ff8b664acb4f2bbfe18cf887b05ae7 (ORCL:CLUS2) [CLUSTERDGBK]
 3. ONLINE   7304aee1fd704f48bf646a27f1f8887f (ORCL:CLUS3) [CLUSTERDGBK]
Located 3 voting disk(s).

Vote disks have been restored. Stop crs on the node and start the cluster as normal on all nodes.

Option 2

8. (Step 1 to 7 same as earlier) repair the disks as in option 1.

9. Mount the ASM instance and create a diskgroup

create diskgroup clusterdgbk disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' FORCE attribute 'compatible.asm'='11.2';

10. Instead of moving the vote disk to new diskgroup stop the crs on the node and start exclusive mode.

crsctl stop crs
crsctl start crs # will fail
crsctl start crs -excl

11. Query the vote disk which will show 0 vote disk

crsctl query css votedisk
Located 0 voting disk(s).

12. Move the vote disks to new diskgroup (even though earlier command showed 0 vote disks) and query vote disks

crsctl replace votedisk +clusterdgbk
Successful addition of voting disk 82e2f5b2b6fa4f8fbf11f95e6b14b3d7
Successful addition of voting disk 921d7cda8dfc4fe6bf00ff1d19d31726
Successful addition of voting disk 14a06e308ce14f88bfb0b28797a688f0

Successfully replaced voting disk group with +clusterdgbk.
CRS-4266: Voting file(s) successfully replaced    

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   82e2f5b2b6fa4f8fbf11f95e6b14b3d7 (ORCL:CLUS1) [CLUSTERDGBK]
 2. ONLINE   921d7cda8dfc4fe6bf00ff1d19d31726 (ORCL:CLUS2) [CLUSTERDGBK]
 3. ONLINE   14a06e308ce14f88bfb0b28797a688f0 (ORCL:CLUS3) [CLUSTERDGBK]
Located 3 voting disk(s).

Stop the node and start the crs on all nodes as normal.

Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

Real Time SQL Monitoring to Capture Bind Variables Values

Real Time SQL Monitoring allows the monitoring of SQL statements while they are being executed. By default Oracle will monitor any SQL statement that either runs in parallel or has consumed at least 5 seconds of combined CPU and I/O time in a single execution.

As of 11gR2 Real Time SQL Monitoring also shows bind variable values used in the SQL statements.

Monitoring could be enabled for any statement by adding the /*+ MONITOR */ hint. Thereafter when the statement runs Oracle will monitor the statement.

However when there's no access to the SQL code this method of enabling monitoring is not possible. This is where dbms_sqltune.import_sql_profile comes in handy.

DBMS_SQLTUNE.IMPORT_SQL_PROFILE is an undocumented method which could be used to attache a profile to a SQL.

This is the test case to attache a monitoring profile to the sql to get the bind variables (11gR2) or/and enable monitoring (11gR1 and 11gR2).

SQL> variable var number
SQL> exec :var := 10; 

SQL> select * from x where b not in (select q from y where y.p = x.a) and x.a = :var;

A   B
--- ------
10  75282

SQL> select sql_id from v$sql where sql_text like 
'select * from x where b not in (select q from y where y.p = x.a) and x.a%';

SQL_ID
-------------
3nprf920bb7az  

---- create the profile for the sql
Declare
Sqltext Clob;
Begin
Select Sql_Text Into Sqltext From V$sqlarea Where Sql_Id='3nprf920bb7az';
Dbms_Sqltune.Import_Sql_Profile(
    Sql_Text => Sqltext,
    Profile => Sqlprof_Attr('MONITOR'),
    CATEGORY => 'DEFAULT',
    Name => 'monitor_profile',
    Force_Match => True);
End;
/

SQL> select * from x where b not in (select q from y where y.p = x.a) and x.a = :var;

When the above select statement is executed SQL will appear in the real time monitoring.

Once done drop the profile with

exec dbms_sqltune.DROP_SQL_PROFILE('monitor_profile');

Profile could be attached to update statements as well but monitoring window does not show the bind variables values.

SQL> variable a number;
SQL> variable b number;  
SQL> exec :a := 10;
SQL> exec :b := 20;
SQL> update vartable set a = :a * :b;
SQL> commit;
SQL> select sql_id from v$sqlarea where sql_text like 'update vartable set a = :a * :b';

SQL_ID
-------------
1g2mxvhws6a3x

Declare
Sqltext Clob;
Begin
Select Sql_Text Into Sqltext From V$sqlarea Where Sql_Id='1g2mxvhws6a3x';
Dbms_Sqltune.Import_Sql_Profile(
    Sql_Text => Sqltext,
    Profile => Sqlprof_Attr('MONITOR'),
    CATEGORY => 'DEFAULT',
    Name => 'monitor_profile',
    Force_Match => True);
End;
/

SQL> update vartable set a = :a * :b;

5 rows updated.

SQL Monitoring shows the following for the update statement

For insert statments attaching profile has no effect and they do not show up on the SQL Monitoring unless /*+ Monitor */ hint is used.

SQL> insert /*+ monitor */ into vartable values (:a,:b);

1 row created.

SQL> commit;

Monday, August 23, 2010

Restoring Vote disk due to ASM disk failures - 1

Restoring OCR due to ASM disk failures was blogged previously. This blog looks at restoring vote disks due to ASM disk failures.

Some important pointers from Clusterware Admin Guide. "OCR and voting disks can be stored in Oracle Automatic Storage Management (Oracle ASM). The Oracle ASM partnership and status table (PST) is replicated on multiple disks and is extended to store OCR. Consequently, OCR can tolerate the loss of the same number of disks as are in the underlying disk group and be relocated in response to disk failures.

Oracle ASM reserves several blocks at a fixed location on every Oracle ASM disk for storing the voting disk. Should the disk holding the voting disk fail, Oracle ASM selects another disk on which to store this data. Storing OCR and the voting disk on Oracle ASM eliminates the need for third-party cluster volume managers and eliminates the complexity of managing disk partitions for OCR and voting disks in Oracle Clusterware installations.

The dd commands used to back up and recover voting disks in previous versions of Oracle Clusterware are not supported in Oracle Clusterware 11g release 2 (11.2).

Voting disk management requires a valid and working OCR. Before you add, delete, replace, or restore voting disks, run the ocrcheck command as root. If OCR is not available or it is corrupt, then you must restore OCR.

The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added.

If all of the voting disks are corrupted, then Restore OCR. This step is necessary only if OCR is also corrupted or otherwise unavailable, such as if OCR is on Oracle ASM and the disk group is no longer available.

The number of voting files you can store in a particular Oracle ASM disk group depends upon the redundancy of the disk group.
External redundancy: A disk group with external redundancy can store only one voting disk
Normal redundancy: A disk group with normal redundancy stores three voting disks
High redundancy: A disk group with high redundancy stores five voting disks

By default, Oracle ASM puts each voting disk in its own failure group within the disk group. A normal redundancy disk group must contain at least two failure groups but if you are storing your voting disks on Oracle ASM, then a normal redundancy disk group must contain at least three failure groups." (Oracle Clusterware Admin Guide)

Following error will be thrown if a diskgroup with two failure groups was used to store the vote disks

crsctl replace votedisk +CLUSTERDG
Failed to create voting files on disk group CLUSTERDG.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.

On the ASM log

Thu Aug 19 15:34:29 2010
NOTE: updated gpnp profile ASM diskstring: ORCL:*
Thu Aug 19 15:34:29 2010
NOTE: Creating voting files in diskgroup CLUSTERDG
Thu Aug 19 15:34:29 2010
NOTE: Voting File refresh pending for group 1/0x23d6dd4 (CLUSTERDG)
NOTE: Attempting voting file creation in diskgroup CLUSTERDG
NOTE: voting file allocation on grp 1 disk CLUS2
NOTE: voting file allocation on grp 1 disk CLUS3
ERROR: Voting file allocation failed for group CLUSTERDG
Errors in file /opt/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_20421.trc:
ORA-15273: Could not create the required number of voting files.
NOTE: Voting file relocation is required in diskgroup CLUSTERDG
NOTE: Attempting voting file relocation on diskgroup CLUSTERDG
NOTE: voting file deletion on grp 1 disk CLUS2
NOTE: voting file deletion on grp 1 disk CLUS3

"A high redundancy disk group must contain at least three failure groups. However, Oracle recommends using several failure groups. A small number of failure groups, or failure groups of uneven capacity, can create allocation problems that prevent full use of all of the available storage.
You must specify enough failure groups in each disk group to support the redundancy type for that disk group.

Neither should you add a voting disk to a cluster file system in addition to the voting disks stored in an Oracle ASM disk group. Oracle does not support having voting disks in Oracle ASM and directly on a cluster file system for the same cluster at the same time."(Oracle Clusterware Admin Guide)

Scenario 1.
1. Only Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. Only one failure group is affected
4. OCR is located in a separate location (in another diskgroup or block device location - not supported by Oracle only valid during migration could be moved to after installation for testing purposes)

All that is required in this scenario is to drop the affected disk from the ASM diskgroup, repair it and add it back to the diskgroup.

1. It is assumed thatvote disks are already in ASM diskgroup, if not move to ASM diskgroup. Current vote disk configuration is as follows

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   393aebdd039e4fdabf242bf461c45136 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   04daf60741d34f9abf22e76fe22913b2 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   2f38bb8318d34f1bbf49d28f6e400f60 (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

2. Corrupt one of the disks to simulate disk failure

/etc/init.d/oracleasm querydisk -p CLUS1
Disk "CLUS1" is a valid ASM disk
/dev/sdc2: LABEL="CLUS1" TYPE="oracleasm"

dd if=/dev/zero of=/dev/sdc2 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.162996 seconds, 1.0 GB/s

3. This has no effect on the OCR. In this case OCR is stored in a block device (not supported by Oracle)

ocrcheck
Status of Oracle Cluster Registry is as follows :
        Version                  :          3
        Total space (kbytes)     :     148348
        Used space (kbytes)      :       4580
        Available space (kbytes) :     143768
        ID                       :  552644455
        Device/File Name         :  /dev/sdc5
                                   Device/File integrity check succeeded

                                   Device/File not configured

                                   Device/File not configured

                                   Device/File not configured

                                   Device/File not configured

        Cluster registry integrity check succeeded
        Logical corruption check succeeded

4. Grid Infrastructure alert log ($CRS_HOME/log/`hostname -s`/alert`hostname -s`.log) shows the following entry after some time

2010-08-19 15:51:56.537
[cssd(25119)]CRS-1605:CSSD voting file is online: ORCL:CLUS1; details in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log.

5. OCSSD log list the following

2010-08-19 15:51:44.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:51:44.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
....
2010-08-19 15:51:49.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:51:49.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-19 15:51:53.513: [    CSSD][1303775552]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-19 15:51:53.513: [    CSSD][1303775552]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-19 15:51:53.816: [    CLSF][1366714688]Closing handle:0x1caa6570
2010-08-19 15:51:53.816: [   SKGFD][1366714688]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x1ccf45c0 for
disk :ORCL:CLUS1:
2010-08-19 15:51:54.515: [    CLSF][1303775552]Closing handle:0x1cd496d0
2010-08-19 15:51:54.515: [   SKGFD][1303775552]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x1cd48e80 for
disk :ORCL:CLUS1:
2010-08-19 15:51:54.516: [    CLSF][1324755264]Closing handle:0x1c958930
2010-08-19 15:51:54.516: [   SKGFD][1324755264]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x1cac17c0 for
disk :ORCL:CLUS1:
2010-08-19 15:51:54.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:51:54.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-19 15:51:54.799: [    CSSD][1146427712]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 2300 > margin 1500 cur_ms 527387044 lastalive 527384744
2010-08-19 15:51:56.523: [    CSSD][1324755264]clssnmvDiskOpen: Opening ORCL:CLUS1
2010-08-19 15:51:56.523: [   SKGFD][1324755264]Handle 0x1cac17c0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:
2010-08-19 15:51:56.523: [    CLSF][1324755264]Opened hdl:0x1ca4d320 for dev:ORCL:CLUS1:
2010-08-19 15:51:56.533: [    CSSD][1324755264]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1281715015
2010-08-19 15:51:56.533: [    CSSD][1324755264]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-19 15:51:56.533: [   SKGFD][1366714688]Handle 0x1cd48e80 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:
2010-08-19 15:51:56.533: [    CLSF][1366714688]Opened hdl:0x1cba4ac0 for dev:ORCL:CLUS1:
2010-08-19 15:51:57.524: [   SKGFD][1303775552]Handle 0x1ccf45c0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:
2010-08-19 15:51:57.524: [    CLSF][1303775552]Opened hdl:0x1cd496d0 for dev:ORCL:CLUS1:
2010-08-19 15:51:59.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:51:59.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-19 15:52:03.403: [    CSSD][1146427712]clssscSelect: cookie accept request 0x1c75c298
2010-08-19 15:52:03.403: [    CSSD][1146427712]clssgmAllocProc: (0x2aaab421d380) allocated
2010-08-19 15:52:03.403: [    CSSD][1146427712]clssgmClientConnectMsg: properties of cmProc 0x2aaab421d380 - 1,2,3,4
2010-08-19 15:52:03.403: [    CSSD][1146427712]clssgmClientConnectMsg: Connect from con(0x41ee9f) proc(0x2aaab421d380) pid(21407)
version 11:2:1:4, properties: 1,2,3,4
2010-08-19 15:52:03.403: [    CSSD][1146427712]clssgmClientConnectMsg: msg flags 0x0000
2010-08-19 15:52:03.404: [    CSSD][1146427712]clssgmExecuteClientRequest: Node name request from client ((nil))
2010-08-19 15:52:03.404: [    CSSD][1146427712]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 29 (0x2aaab421d380)
2010-08-19 15:52:03.405: [    CSSD][1146427712]clssgmDeadProc: proc 0x2aaab421d380
2010-08-19 15:52:03.405: [    CSSD][1146427712]clssgmDestroyProc: cleaning up proc(0x2aaab421d380) con(0x41ee9f) skgpid  ospid 21407 with 0 clients, refcount 0
2010-08-19 15:52:03.405: [    CSSD][1146427712]clssgmDiscEndpcl: gipcDestroy 0x41ee9f
2010-08-19 15:52:04.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:52:04.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-19 15:52:09.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:52:09.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-19 15:52:14.790: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:52:14.790: [    CSSD][1230346560]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-19 15:52:18.260: [    CSSD][1146427712]clssscSelect: cookie accept request 0x1c75c298
2010-08-19 15:52:18.260: [    CSSD][1146427712]clssgmAllocProc: (0x2aaab4261540) allocated
2010-08-19 15:52:18.260: [    CSSD][1146427712]clssgmClientConnectMsg: properties of cmProc 0x2aaab4261540 - 1,2,3,4
2010-08-19 15:52:18.260: [    CSSD][1146427712]clssgmClientConnectMsg: Connect from con(0x41ef82) proc(0x2aaab4261540) pid(21428)
version 11:2:1:4, properties: 1,2,3,4
2010-08-19 15:52:18.260: [    CSSD][1146427712]clssgmClientConnectMsg: msg flags 0x0000
2010-08-19 15:52:18.261: [    CSSD][1146427712]clssgmExecuteClientRequest: Node name request from client ((nil))
2010-08-19 15:52:18.262: [    CSSD][1146427712]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 29 (0x2aaab4261540)
2010-08-19 15:52:18.263: [    CSSD][1146427712]clssgmDeadProc: proc 0x2aaab4261540
2010-08-19 15:52:18.263: [    CSSD][1146427712]clssgmDestroyProc: cleaning up proc(0x2aaab4261540) con(0x41ef82) skgpid  ospid 21428 with 0 clients, refcount 0
2010-08-19 15:52:18.263: [    CSSD][1146427712]clssgmDiscEndpcl: gipcDestroy 0x41ef82
2010-08-19 15:52:18.788: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes
2010-08-19 15:52:18.788: [    CSSD][1230346560]clssnmSendingThread: sent 4 status msgs to all nodes
2010-08-19 15:52:23.788: [    CSSD][1230346560]clssnmSendingThread: sending status msg to all nodes

Querying vote disk will show all three vote disks to be online

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   393aebdd039e4fdabf242bf461c45136 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   04daf60741d34f9abf22e76fe22913b2 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   2f38bb8318d34f1bbf49d28f6e400f60 (ORCL:CLUS3) [CLUSTERDG]

6. Stop and start the crs to see the effects of a restart

crsctl stop crs
crsctl start crs

While restarting the ocssd.log will show discovering on two vote disks instead of three

2010-08-19 15:58:12.758: [    CSSD][1145928000]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-19 15:58:12.758: [    CSSD][1145928000]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-19 15:58:12.758: [    CSSD][1145928000]clssnmvDiskVerify: Successful discovery of 2 disks
2010-08-19 15:58:12.758: [    CSSD][1145928000]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-08-19 15:58:12.758: [    CSSD][1145928000]clssnmCompleteVFDiscovery: Completing voting file discovery

and also the crs alert log

2010-08-19 15:58:22.999
[cssd(6264)]CRS-1605:CSSD voting file is online: ORCL:CLUS3; details in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log.
2010-08-19 15:58:23.035
[cssd(6264)]CRS-1605:CSSD voting file is online: ORCL:CLUS2; details in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log.

At the end cluster stack will start but only with two vote disks

crsctl check css
CRS-4529: Cluster Synchronization Services is online
[root@hpc1 oracle +ASM1]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online                       

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. OFFLINE  393aebdd039e4fdabf242bf461c45136 () []
2. ONLINE   04daf60741d34f9abf22e76fe22913b2 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   2f38bb8318d34f1bbf49d28f6e400f60 (ORCL:CLUS3) [CLUSTERDG]

7. The diskgroup containing the vote disk will be in a dismounted state after the restart.

select name,state from v$asm_diskgroup;
STATE       NAME
----------- ---------
DISMOUNTED  CLUSTERDG

From crsctl stat

HA Resource                                   Target     State
-----------                                   ------     -----
ora.CLUSTERDG.dg                              ONLINE     OFFLINE
ora.DATA.dg                                   ONLINE     ONLINE on hpc1
ora.FLASH.dg                                  ONLINE     ONLINE on hpc1

8. Reason for the diskgroup not beign mounted is the failed disk. Following from the ASM alert log

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
NOTE: Diskgroup used for Voting files is:
        CLUSTERDG
NOTE: cache registered group CLUSTERDG number=1 incarn=0x909ef75f
NOTE: cache began mount (first) of group CLUSTERDG number=1 incarn=0x909ef75f
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,1) to disk (ORCL:CLUS2)
NOTE: Assigning number (1,2) to disk (ORCL:CLUS3)
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: start heartbeating (grp 1)
kfdp_query(CLUSTERDG): 3
kfdp_queryBg(): 3
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: Assigning number (1,0) to disk ()
kfdp_query(CLUSTERDG): 4
kfdp_queryBg(): 4
NOTE: group CLUSTERDG: updated PST location: disk 0001 (PST copy 0)
NOTE: group CLUSTERDG: updated PST location: disk 0002 (PST copy 1)
NOTE: cache dismounting (clean) group 1/0x909EF75F (CLUSTERDG)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x909EF75F (CLUSTERDG)
NOTE: cache ending mount (fail) of group CLUSTERDG number=1 incarn=0x909ef75f
kfdp_dismount(): 5
kfdp_dismountBg(): 5
NOTE: De-assigning number (1,0) from disk ()
NOTE: De-assigning number (1,1) from disk (ORCL:CLUS2)
NOTE: De-assigning number (1,2) from disk (ORCL:CLUS3)
ERROR: diskgroup CLUSTERDG was not mounted
NOTE: cache deleting context for group CLUSTERDG 1/-1868630177
WARNING: Disk Group CLUSTERDG containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "1"
ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent */

Since normal redundancy diskgroups can tolarate a failure of one failure group this diskgroup could be mounted with force option.

SQL> alter diskgroup clusterdg mount force;

Diskgroup altered.

9. Once the diskgroup is mounted drop the failed disk, repair and add to the diskgroup and mount the disk group

SQL> alter diskgroup clusterdg drop disk CLUS1;
alter diskgroup clusterdg drop disk CLUS1
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15084: ASM disk "CLUS1" is offline and cannot be dropped.


SQL> alter diskgroup clusterdg drop disk CLUS1 force;

Diskgroup altered.                                  

# /etc/init.d/oracleasm deletedisk clus1
Removing ASM disk "clus1":                                 [  OK  ]
# /etc/init.d/oracleasm createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk:                       [  OK  ]

alter diskgroup clusterdg add failgroup fail1 disk 'ORCL:CLUS1';

10. Verify the new disk being added as a vote disk. On the ASM alert log

Thu Aug 19 16:08:41 2010
ARB0 started with pid=26, OS id=7771
NOTE: assigning ARB0 to group 3/0xa62ef779 (CLUSTERDG)
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 3:2 for diskgroup 3 (CLUSTERDG)
NOTE: Voting file relocation is required in diskgroup CLUSTERDG
NOTE: Attempting voting file relocation on diskgroup CLUSTERDG
NOTE: voting file allocation on grp 3 disk CLUS1

Querying the vote disk

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  393aebdd039e4fdabf242bf461c45136 () []
 2. ONLINE   04daf60741d34f9abf22e76fe22913b2 (ORCL:CLUS2) [CLUSTERDG]
 3. ONLINE   2f38bb8318d34f1bbf49d28f6e400f60 (ORCL:CLUS3) [CLUSTERDG]
 4. ONLINE   8a222576839d4f74bfa6f5949788c521 (ORCL:CLUS1) [CLUSTERDG]

The disk sequence numbers have changed to reflect the new order.

11. Move the vote disks to another diskgroup or block device and back to drop the offline vote disk (add and delete options on crsctl are not valid when vote disk is in ASM)

crsctl replace votedisk +DATA

crsctl replace votedisk /dev/sdc6

Now formatting voting disk: /dev/sdc6.
CRS-4256: Updating the profile
Successful addition of voting disk 13f010bf72264f46bfb5aa1a50ce05a3.
Successful deletion of voting disk 393aebdd039e4fdabf242bf461c45136.
Successful deletion of voting disk 04daf60741d34f9abf22e76fe22913b2.
Successful deletion of voting disk 2f38bb8318d34f1bbf49d28f6e400f60.
Successful deletion of voting disk 8a222576839d4f74bfa6f5949788c521.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
[oracle@hpc1 ~ clusdb1]$ crsctl replace votedisk +CLUSTERDG
CRS-4256: Updating the profile
Successful addition of voting disk 1b6460e7b9394f96bf7cb3d04b9db909
Successful addition of voting disk 9bf04ee41bda4fd0bf90cee544a1c7f3
Successful addition of voting disk 3f77a85b0e274fb8bfb58f5190073867.
Successful deletion of voting disk 13f010bf72264f46bfb5aa1a50ce05a3.
Successfully replaced voting disk group with +CLUSTERDG.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   1b6460e7b9394f96bf7cb3d04b9db909 (ORCL:CLUS2) [CLUSTERDG]
 2. ONLINE   9bf04ee41bda4fd0bf90cee544a1c7f3 (ORCL:CLUS3) [CLUSTERDG]
 3. ONLINE   3f77a85b0e274fb8bfb58f5190073867 (ORCL:CLUS1) [CLUSTERDG]

Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

A! Help

Labels

Friday, August 27, 2010

"Cache Fusion" Behavior for Result Cache

Wednesday, August 25, 2010

Restoring OCR & Vote disk due to ASM disk failures - 1

Restoring Vote disk due to ASM disk failures - 3

Tuesday, August 24, 2010

Restoring Vote disk due to ASM disk failures - 2

Real Time SQL Monitoring to Capture Bind Variables Values

Monday, August 23, 2010

Restoring Vote disk due to ASM disk failures - 1

About Me

Downloads

Quick Response

Popular

Blog Archive

Total Pageviews

Followers

Oracle Documentation