A! Help: Restoring Vote disk due to ASM disk failures

Previous blog was about failure of one failure group in the ASM diskgroup where the vote disks resides. This blog is about two failure groups being affected.

Scenario 2.
1. Only Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. Only two failure groups are affected
4. OCR is located in a separate location (in another diskgroup or block device location - not supported by Oracle only valid during migration could be moved to after installation for testing purposes)

1. It is assumed thatvote disks are already in ASM diskgroup, if not move to ASM diskgroup. Current vote disk configuration is as follows

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   8ac9c8c3ed694f3cbf36ded6f9587a24 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE   7610f9e2fc134fdabf97e758e7f017d2 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE   7a1a2e08ea384f20bf8be3a54e389622 (ORCL:CLUS3) [CLUSTERDG]

2. Identify two disks in the diskgroup where vote disks resides and corrupt them to simulate disk failure

# /etc/init.d/oracleasm querydisk -p clus1
Disk "CLUS1" is a valid ASM disk
/dev/sdc2: LABEL="CLUS1" TYPE="oracleasm"
# /etc/init.d/oracleasm querydisk -p clus2
Disk "CLUS2" is a valid ASM disk
/dev/sdc3: LABEL="CLUS2" TYPE="oracleasm"          

dd if=/dev/zero of=/dev/sdc2 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.158475 seconds, 1.1 GB/s

dd if=/dev/zero of=/dev/sdc3 count=20480 bs=8192
20480+0 records in
20480+0 records out
167772160 bytes (168 MB) copied, 0.160278 seconds, 1.0 GB/s

3. ocssd.log will have the following entries showing the detection of vote disk corruption for both disks

2010-08-24 13:13:26.362: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:13:26.362: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-24 13:13:30.640: [    CSSD][1418426688]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-24 13:13:30.640: [    CSSD][1418426688]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-24 13:13:31.052: [    CLSF][1397446976]Closing handle:0x2aaab00f6b90
2010-08-24 13:13:31.052: [   SKGFD][1397446976]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0064e90
for disk :ORCL:CLUS1:

2010-08-24 13:13:31.372: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:13:31.372: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-24 13:13:31.380: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS3 sched delay 1560 > margin 1500 cur_ms 949819404 lastalive 949817844
2010-08-24 13:13:31.478: [    CLSF][1365977408]Closing handle:0x13bf5d0
2010-08-24 13:13:31.478: [   SKGFD][1365977408]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x1943e00 for
disk :ORCL:CLUS1:

2010-08-24 13:13:31.478: [    CSSD][1365977408]clssnmvScanCompletions: completed 6 items
2010-08-24 13:13:31.642: [    CLSF][1418426688]Closing handle:0x2aaab830a8f0
2010-08-24 13:13:31.642: [   SKGFD][1418426688]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab805ac60
for disk :ORCL:CLUS1:

2010-08-24 13:13:32.932: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS1 sched delay 2880 > margin 1500 cur_ms 949820954 lastalive 949818074
2010-08-24 13:13:34.075: [    CSSD][1397446976]clssnmvDiskOpen: Opening ORCL:CLUS1
2010-08-24 13:13:34.075: [   SKGFD][1397446976]Handle 0x2aaab0064e90 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-24 13:13:34.075: [    CLSF][1397446976]Opened hdl:0x2aaab0002110 for dev:ORCL:CLUS1:
2010-08-24 13:13:34.093: [    CSSD][1397446976]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282573835
2010-08-24 13:13:34.093: [    CSSD][1397446976]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online
2010-08-24 13:13:34.093: [   SKGFD][1365977408]Handle 0x18ee190 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-24 13:13:34.093: [    CLSF][1365977408]Opened hdl:0x1835260 for dev:ORCL:CLUS1:
2010-08-24 13:13:34.655: [   SKGFD][1418426688]Handle 0x2aaab805ac60 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS1:

2010-08-24 13:13:34.655: [    CLSF][1418426688]Opened hdl:0x2aaab8085e90 for dev:ORCL:CLUS1:
2010-08-24 13:13:34.666: [    CSSD][1418426688]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-08-24 13:13:34.666: [    CSSD][1418426688]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-08-24 13:13:34.867: [    CSSD][1397446976]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now online 
2010-08-24 13:13:36.372: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:13:36.372: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
...
2010-08-24 13:14:25.608: [    CSSD][1135200576]clssgmRegisterShared: global grock DBCLUSDB member 0 share type 1, refcount 13
2010-08-24 13:14:25.609: [    CSSD][1135200576]clssgmExecuteClientRequest: Node name request from client ((nil))
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssscSelect: cookie accept request 0x2aaaac0faea0
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssscevtypSHRCON: getting client with cmproc 0x2aaaac0faea0
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmRegisterClient: proc(33/0x2aaaac0faea0), client(2/0x2aaaac105330)
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmExecuteClientRequest: GRPSHREG recvd from client 2 (0x2aaaac105330)
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmRegisterShared: grp DG_LOCAL_DATA, mbr 0, type 1
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmQueueShare: (0x2aaaac183a90) target local grock DG_LOCAL_DATA member 0 type
1 queued from client (0x2aaaac105330), local grock DG_LOCAL_DATA, refcount 11
2010-08-24 13:14:25.610: [    CSSD][1135200576]clssgmRegisterShared: local grock DG_LOCAL_DATA member 0 share type 1, refcount 112010-08-24 13:14:30.139: [    CSSD][1166670144]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS2)
2010-08-24 13:14:30.139: [    CSSD][1166670144]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now offline
2010-08-24 13:14:30.190: [    CLSF][1156180288]Closing handle:0x2aaab0009a20
2010-08-24 13:14:30.190: [   SKGFD][1156180288]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab02179d0
for disk :ORCL:CLUS2:
2010-08-24 13:14:30.380: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:14:30.380: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes
2010-08-24 13:14:30.594: [    CLSF][1187649856]Closing handle:0x2aaab809bae0
2010-08-24 13:14:30.594: [   SKGFD][1187649856]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab80e6340
for disk :ORCL:CLUS2:

2010-08-24 13:14:31.140: [    CLSF][1166670144]Closing handle:0x2aaaac19fac0
2010-08-24 13:14:31.140: [   SKGFD][1166670144]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaaac188e90
for disk :ORCL:CLUS2:

2010-08-24 13:14:31.299: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 2110 > margin 1500 cur_ms 949879314 lastalive 949877204
2010-08-24 13:14:33.202: [    CSSD][1156180288]clssnmvDiskOpen: Opening ORCL:CLUS2
2010-08-24 13:14:33.202: [   SKGFD][1156180288]Handle 0x2aaab02179d0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-24 13:14:33.202: [    CLSF][1156180288]Opened hdl:0x2aaab0101b50 for dev:ORCL:CLUS2:
2010-08-24 13:14:33.236: [    CSSD][1156180288]clssnmvStatusBlkInit: myinfo nodename hpc1, uniqueness 1282573835
2010-08-24 13:14:33.236: [    CSSD][1156180288]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now online
2010-08-24 13:14:33.236: [   SKGFD][1187649856]Handle 0x2aaab82018a0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-24 13:14:33.236: [    CLSF][1187649856]Opened hdl:0x2aaab8202320 for dev:ORCL:CLUS2:
2010-08-24 13:14:33.379: [    CSSD][1135200576]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:CLUS2 sched delay 4190 > margin 1500 cur_ms 949881394 lastalive 949877204
2010-08-24 13:14:34.081: [   SKGFD][1166670144]Handle 0x2aaaac188e90 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS2:

2010-08-24 13:14:34.081: [    CLSF][1166670144]Opened hdl:0x2aaaac1aada0 for dev:ORCL:CLUS2:
2010-08-24 13:14:34.113: [    CSSD][1166670144]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS2)
2010-08-24 13:14:34.113: [    CSSD][1166670144]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now offline
2010-08-24 13:14:34.113: [    CSSD][1156180288]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS2 now online
2010-08-24 13:14:35.380: [    CSSD][1261078848]clssnmSendingThread: sending status msg to all nodes
2010-08-24 13:14:35.380: [    CSSD][1261078848]clssnmSendingThread: sent 5 status msgs to all nodes

4. Stop and try to start the crs, which will fail on start. ocssd.log will have the list the reason

crsctl stop crs
crsctl start crs

2010-08-24 13:18:29.130: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.130: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.130: [    CSSD][1149827392]clssnmvDiskVerify: discovered a potential voting file
2010-08-24 13:18:29.130: [   SKGFD][1149827392]Handle 0x17870510 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:CLUS3:

2010-08-24 13:18:29.130: [    CLSF][1149827392]Opened hdl:0x1780db80 for dev:ORCL:CLUS3:
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: Successful discovery for disk ORCL:CLUS3, UID 7a1a2e08-ea384f20-bf8be3a5-4e389622, Pending CIN 0:1282650882:0, Committed CIN 0:1282650882:0
2010-08-24 13:18:29.140: [    CLSF][1149827392]Closing handle:0x1780db80
2010-08-24 13:18:29.140: [   SKGFD][1149827392]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x17870510 for
disk :ORCL:CLUS3:

2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskVerify: Successful discovery of 1 disks
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmCompleteVFDiscovery: Completing voting file discovery
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskStateChange: state from discovered to pending disk ORCL:CLUS3
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvDiskStateChange: state from pending to configured disk ORCL:CLUS3
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssnmvVerifyCommittedConfigVFs: Insufficient voting files found, found 1 of 3 configured, needed 2 voting files
2010-08-24 13:18:29.140: [    CSSD][1149827392](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 0, id 8ac9c8c3-ed694f3c-bf36ded6-f9587a24 not found
2010-08-24 13:18:29.140: [    CSSD][1149827392](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 1, id 7610f9e2-fc134fda-bf97e758-e7f017d2 not found
2010-08-24 13:18:29.140: [    CSSD][1149827392]ASSERT clssnm1.c 2829
2010-08-24 13:18:29.140: [    CSSD][1149827392](:CSSNM00021:)clssnmCompleteVFDiscovery: Found 1 voting files, but 2 are required.
Terminating due to insufficient configured voting files
2010-08-24 13:18:29.140: [    CSSD][1149827392]###################################
2010-08-24 13:18:29.140: [    CSSD][1149827392]clssscExit: CSSD aborting from thread clssnmvDDiscThread
2010-08-24 13:18:29.140: [    CSSD][1149827392]###################################
2010-08-24 13:18:29.141: [    CSSD][1149827392]

Cluster alert log will also have information on the missing vote disks

2010-08-24 13:18:29.140
[cssd(13590)]CRS-1637:Unable to locate configured voting file with ID 8ac9c8c3-ed694f3c-bf36ded6-f9587a24; details at (:CSSNM00020:) in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log
2010-08-24 13:18:29.140
[cssd(13590)]CRS-1637:Unable to locate configured voting file with ID 7610f9e2-fc134fda-bf97e758-e7f017d2; details at (:CSSNM00020:) in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log
2010-08-24 13:18:29.140
[cssd(13590)]CRS-1705:Found 1 configured voting files but 2 voting files are required, terminating to ensure data integrity; details at (:CSSNM00021:) in /opt/app/11.2.0/grid/log/hpc1/cssd/ocssd.log

Checking crs status will show everything online

crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

5. Stop the crs on all nodes if required use -f option (force) and start crs on one node in exclusive mode which doesn't require the use of vote disk.

crsctl stop crs -f
crsctl start crs -excl

6. Once started query the vote disks

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. OFFLINE  8ac9c8c3ed694f3cbf36ded6f9587a24 () []
2. OFFLINE  7610f9e2fc134fdabf97e758e7f017d2 () []
3. ONLINE   7a1a2e08ea384f20bf8be3a54e389622 (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

7. There are several possibilities to restore the vote disks from this point onwards. First shown is adding a new diskgroup and moving the vote disks to the new disk group. It is also possible to repair the exisiting disk and reuse them, which is shown later on. At this stage ASM instance is not mounted, to create a diskgroup mount the ASM instance.

sqlplus  / as sysasm

SQL> startup mount
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2212656 bytes
Variable Size             256552144 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"CLUSTERDG"

From ASM alert log

Tue Aug 24 13:28:03 2010
SQL> ALTER DISKGROUP ALL MOUNT
NOTE: Diskgroup used for Voting files is:
       CLUSTERDG
NOTE: cache registered group CLUSTERDG number=1 incarn=0xd5443b06
NOTE: cache began mount (first) of group CLUSTERDG number=1 incarn=0xd5443b06
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,2) to disk (ORCL:CLUS3)
ERROR: no PST quorum in group: required 2, found 1
NOTE: cache dismounting (clean) group 1/0xD5443B06 (CLUSTERDG)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xD5443B06 (CLUSTERDG)
NOTE: cache ending mount (fail) of group CLUSTERDG number=1 incarn=0xd5443b06
kfdp_dismount(): 2
kfdp_dismountBg(): 2
NOTE: De-assigning number (1,2) from disk (ORCL:CLUS3)
ERROR: diskgroup CLUSTERDG was not mounted
NOTE: cache deleting context for group CLUSTERDG 1/-716948730
WARNING: Disk Group CLUSTERDG containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "CLUSTERDG"
ERROR: ALTER DISKGROUP ALL MOUNT

Since two failure groups were affected the diskgroup holding the vote disks won't be mounted as well as any other disk groups. They must be explicitly mounted. But since ASM instnace is up a new diskgroup could be created.

8. Create a new diskgroup and move the vote disks.

create diskgroup votedg quorum failgroup fail1 disk 'ORCL:RED1' failgroup fail2 disk 'ORCL:RED2' failgroup fail3 disk 'ORCL:RED3' attribute 'compatible.asm'='11.2';

crsctl replace votedisk +votedg
Successful addition of voting disk a24f198797e64f78bff42b8721b964d2
Successful addition of voting disk 9d33c71328c74f2fbf86d3e5a078b648
Successful addition of voting disk 88b2c9201e9e4f4ebf0732813b5be8f1

Successful deletion of voting disk 8ac9c8c3ed694f3cbf36ded6f9587a24.
Successful deletion of voting disk 7610f9e2fc134fdabf97e758e7f017d2.
Successful deletion of voting disk 7a1a2e08ea384f20bf8be3a54e389622.
Successfully replaced voting disk group with +votedg.
CRS-4266: Voting file(s) successfully replaced      

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   a24f198797e64f78bff42b8721b964d2 (ORCL:RED1) [VOTEDG]
2. ONLINE   9d33c71328c74f2fbf86d3e5a078b648 (ORCL:RED2) [VOTEDG]
3. ONLINE   88b2c9201e9e4f4ebf0732813b5be8f1 (ORCL:RED3) [VOTEDG]
Located 3 voting disk(s).

"A quorum failure group is a special type of failure group and disks in these failure groups do not contain user data and are not considered when determining redundancy requirements" (Storage Admin Guide).

9. Stop crs on the node started in exclusive mode and start the crs on all nodes as normal

crsctl stop crs
crsctl start crs

10. The old diskgroup name will still be there in the ASM instance and may need some clean up activities. Since a normal redundancy diskgroup can only tolerate failure of one failure group, the diskgroup in question won't mount even with force option.

"Oracle ASM provides a MOUNT FORCE option with ALTER DISKGROUP to enable Oracle ASM disk groups to be mounted in normal or high redundancy modes even though some Oracle ASM disks may be unavailable to the disk group at mount time.

The default behavior without the FORCE option is to fail to mount a disk group that has damaged or missing disks.

The MOUNT FORCE option is useful in situations where a disk is temporarily unavailable and you want to mount the disk group with reduced redundancy while you correct the situation that caused the outage.

To successfully mount with the MOUNT FORCE option, Oracle ASM must be able to find at least one copy of the extents for all of the files in the disk group. In this case, Oracle ASM can successfully mount the disk group, but with potentially reduced redundancy.

The MOUNT FORCE option is useful in situations where a disk is temporarily unavailable and you want to mount the disk group with reduced redundancy while you correct the situation that caused the outage.

In clustered ASM environments, if an ASM instance is not the first instance to mount the disk group, then using the MOUNT FORCE statement fails. This is because the disks have been accessed by another instance and the disks are not locally accessible.

The FORCE option corrects configuration errors, such as incorrect values for ASM_DISKSTRING, without incurring unnecessary rebalance operations. Disk groups mounted with the FORCE option have one or more disks offline if the disks were not available at the time of the mount. You must take corrective action to restore those devices before the time set with the DISK_REPAIR_TIME value expires. Failing to restore and put those disks back online within the disk repair time frame results in Oracle ASM automatically removing the disks from the disk group".(Storage Admin Guide)

SQL> alter diskgroup clusterdg mount force;
alter diskgroup clusterdg mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"CLUSTERDG"

Solution is to assign the surviving disk forcefuly to a new diskgroup and drop it later."Caution: Use extreme care when using the FORCE option because the Oracle ASM instance does not verify whether the disk group is used by any other Oracle ASM instance before Oracle ASM deletes the disk group" (Storage Admin Guide)

SQL> create diskgroup dummydg disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' FORCE;
SQL> DROP diskgroup dummydg;

Delete the old diskgroup information from the cluster

crsctl stat res -p | grep dg
NAME=ora.CLUSTERDG.dg
NAME=ora.DATA.dg
NAME=ora.FLASH.dg
NAME=ora.VOTEDG.dg              

crsctl delete resource ora.CLUSTERDG.dg

Intead of step 8 above where a new diskgroup was created, failed disks could be repaired and used again to store the vote disks. Two alternative options are listed below.

Option 1

8. (Step 1 to 7 same as above). Repair the failed disks

# /etc/init.d/oracleasm deletedisk clus2
Removing ASM disk "clus2":                                 [  OK  ]
# /etc/init.d/oracleasm deletedisk clus1
Removing ASM disk "clus1":                                 [  OK  ]
# /etc/init.d/oracleasm createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk:                       [  OK  ]
# /etc/init.d/oracleasm createdisk clus2 /dev/sdc3
Marking disk "clus2" as an ASM disk:                       [  OK  ]

9. Querying vote disk will show

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  4d2e63d262504f4bbff82fdbf6a24869 () []
 2. OFFLINE  b161fa684ad04fa9bfd259c8cfaa2288 () []
 3. ONLINE   6d7c8162acae4f65bffb635dd7892b6a (ORCL:CLUS3) [CLUSTERDG]

10. Start the ASM instance as mentioned earlier and create a diskgroup

create diskgroup clusterdgbk disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' force attribute 'compatible.asm'='11.2';

When the statement finishes diskgroup would be in mounted state."Caution: Use extreme care when using the FORCE option because the Oracle ASM instance does not verify whether the disk group is used by any other Oracle ASM instance before Oracle ASM deletes the disk group"(Storage Admin Guide).

11. Replace the vote disks with the new diskgroup

crsctl replace votedisk +clusterdgbk
CRS-4602: Failed 3 to add voting file 8b63eece18624f20bfdeb8c91a8e142d
CRS-4602: Failed 3 to add voting file 18ff8b664acb4f2bbfe18cf887b05ae7.
CRS-4602: Failed 3 to add voting file 7304aee1fd704f48bf646a27f1f8887f

Failure 3 with Cluster Synchronization Services while deleting voting disk.
Failure 3 with Cluster Synchronization Services while deleting voting disk.
Failure 3 with Cluster Synchronization Services while deleting voting disk.
Failed to replace voting disk group with +clusterdgbk.
CRS-4000: Command Replace failed, or completed with errors.

Output shows everything has failed and end of the command crs would be aborted.

12. Stop any running cluster processes and start the crs in exclusive mode and query the votedisks

crsctl start crs -excl
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   8b63eece18624f20bfdeb8c91a8e142d (ORCL:CLUS1) [CLUSTERDGBK]
 2. ONLINE   18ff8b664acb4f2bbfe18cf887b05ae7 (ORCL:CLUS2) [CLUSTERDGBK]
 3. ONLINE   7304aee1fd704f48bf646a27f1f8887f (ORCL:CLUS3) [CLUSTERDGBK]
Located 3 voting disk(s).

Vote disks have been restored. Stop crs on the node and start the cluster as normal on all nodes.

Option 2

8. (Step 1 to 7 same as earlier) repair the disks as in option 1.

9. Mount the ASM instance and create a diskgroup

create diskgroup clusterdgbk disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' FORCE attribute 'compatible.asm'='11.2';

10. Instead of moving the vote disk to new diskgroup stop the crs on the node and start exclusive mode.

crsctl stop crs
crsctl start crs # will fail
crsctl start crs -excl

11. Query the vote disk which will show 0 vote disk

crsctl query css votedisk
Located 0 voting disk(s).

12. Move the vote disks to new diskgroup (even though earlier command showed 0 vote disks) and query vote disks

crsctl replace votedisk +clusterdgbk
Successful addition of voting disk 82e2f5b2b6fa4f8fbf11f95e6b14b3d7
Successful addition of voting disk 921d7cda8dfc4fe6bf00ff1d19d31726
Successful addition of voting disk 14a06e308ce14f88bfb0b28797a688f0

Successfully replaced voting disk group with +clusterdgbk.
CRS-4266: Voting file(s) successfully replaced    

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   82e2f5b2b6fa4f8fbf11f95e6b14b3d7 (ORCL:CLUS1) [CLUSTERDGBK]
 2. ONLINE   921d7cda8dfc4fe6bf00ff1d19d31726 (ORCL:CLUS2) [CLUSTERDGBK]
 3. ONLINE   14a06e308ce14f88bfb0b28797a688f0 (ORCL:CLUS3) [CLUSTERDGBK]
Located 3 voting disk(s).

Stop the node and start the crs on all nodes as normal.

Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

A! Help

Labels

Tuesday, August 24, 2010

Restoring Vote disk due to ASM disk failures - 2

About Me

Downloads

Quick Response

Popular

Blog Archive

Total Pageviews

Followers

Oracle Documentation