Sunday, February 14, 2016

Restoring OCR and Vote Disks on Block Devices in a 11gR2 Cluster (11.2.0.4)

Using block or raw devices for OCR and vote disks is not supported for new installation of 11.2. However it is supported for upgraded systems. Following is from the Oracle clusterware admin guide "Oracle Universal Installer for Oracle Clusterware 11g release 2 (11.2), does not support the use of raw or block devices. However, if you upgrade from a previous Oracle Clusterware release, then you can continue to use raw or block devices. Oracle recommends that you use Oracle ASM to store OCR and voting disks." On 12c block devices are not supported at all and ocr and vote disks must be moved to ASM before the upgrade.
However if chosen the ocr and vote disks can remain in block devices such as after upgrade from 11.1.0.7 to 11.2.0.4 (or 11.2.0.3). In 10g and 11gR1 clusters the vote disk was backed up using dd command. However this is not supported on 11.2. Following is from the clusterware admin guide "The dd commands used to back up and recover voting disks in previous versions of Oracle Clusterware are not supported in Oracle Clusterware 11g release 2 (11.2). Restoring voting disks that were copied using dd or cp commands can prevent the Oracle Clusterware 11g release 2 (11.2) stack from coming up." On 11.2 vote disks are not needed to be backed up separately. From clusterware admin "In Oracle Clusterware 11g release 2 (11.2), you no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added."
So only the OCR is needed to be backed up and this could be used to restore both ocr and vote disk. This post shows steps for restoring OCR and vote disks that are stored on block devices (after all of the ocr and vote disks have failed or corrupted). The environment used for this is a 11.2.0.4 two node cluster which was previously upgraded from 11.1.0.7.
Two OCR files are
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     292924
         Used space (kbytes)      :       6452
         Available space (kbytes) :     286472
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded

         Logical corruption check succeeded
and the vote disks
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   6cc871618b1e5f9fbf795315f36b3c21 (/dev/sdh1) []
 2. ONLINE   8f988bcfda0b5fe0bf783f44e81d3f66 (/dev/sdf1) []
 3. ONLINE   6af6320f0979dff7bf6fa474cea2765a (/dev/sdg1) []
OCR and vote disks all were corrupted with dd command
 for i in /dev/sdb1 /dev/sde1 /dev/sdh1 /dev/sdg1 /dev/sdf1
> do
> dd if=/dev/zero of=$i bs=8192 count=1000
> done
This results in OCR and vote disks becoming unusable. Below output shows vote disk state pending offline
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. PENDOFFL 6cc871618b1e5f9fbf795315f36b3c21 (/dev/sdh1) []
 2. PENDOFFL 8f988bcfda0b5fe0bf783f44e81d3f66 (/dev/sdf1) []
 3. PENDOFFL 6af6320f0979dff7bf6fa474cea2765a (/dev/sdg1) []
Following message could be observed on ocssd.log
2016-01-15 17:08:16.921: [    CSSD][3422501184]clssnmvVoteDiskValidation: Voting disk /dev/sdh1 is corrupted
2016-01-15 17:08:16.921: [    CSSD][3422501184]clssnmvWorkerThread: disk /dev/sdh1 corrupted
2016-01-15 17:08:16.921: [    CSSD][3422501184]clssnmvDiskAvailabilityChange: voting file /dev/sdh1 now offline
2016-01-15 17:08:17.405: [   SKGFD][3419347264]Lib :UFS:: closing handle 0x85937a0 for disk :/dev/sdh1:

2016-01-15 17:08:20.746: [    CSSD][3412990272]clssnmvVoteDiskValidation: Voting disk /dev/sdg1 is corrupted
2016-01-15 17:08:20.746: [    CSSD][3412990272]clssnmvWorkerThread: disk /dev/sdg1 corrupted
2016-01-15 17:08:20.746: [    CSSD][3412990272]clssnmvDiskAvailabilityChange: voting file /dev/sdg1 now offline
2016-01-15 17:08:20.901: [   SKGFD][3422501184]Lib :UFS:: closing handle 0x86afd90 for disk :/dev/sdh1:
Eventually the two nodes got rebooted after which the OCR and vote disk restore process began.
Stop clusterware on all nodes with -f option.
# crsctl stop crs -f
At times this could take a while. Quickest option was to disable crs and reboot the nodes. Once nodes starts enable crs again but do not start the cluster stack.



When clusterware stack is down on all nodes, start it only on a single node with exclusive and nocrs options.
crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
If crsd process is running stop it.
crsctl stop resource ora.crsd -init
Restore the OCR with a backup
#ocrconfig -restore /opt/app/11.2.0/grid/cdata/cg_11g_cluster/backup_20160115_170721.ocr
# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     292924
         Used space (kbytes)      :       6452
         Available space (kbytes) :     286472
         ID                       :  675013742
         Device/File Name         :  /dev/sdb1
                                    Device/File integrity check succeeded
         Device/File Name         :  /dev/sde1
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

         Logical corruption check succeeded
As seen from the ocrcheck output the ocr files were restored to the original location listed on the ocr.loc file.
At this stage the vote disks are still not listed.
crsctl query css votedisk
Located 0 voting disk(s).
To restore the vote disks run add votedisk specifying the block device locations (replace votedisk is not applicable for non-ASM locations).
# crsctl add css  votedisk /dev/sdh1 /dev/sdg1 /dev/sdf1
Now formatting voting disk: /dev/sdh1.
Now formatting voting disk: /dev/sdg1.
Now formatting voting disk: /dev/sdf1.
CRS-4603: Successful addition of voting disk /dev/sdh1.
CRS-4603: Successful addition of voting disk /dev/sdg1.
CRS-4603: Successful addition of voting disk /dev/sdf1.
[root@rac1 oracle]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   274a68a133e84f65bf3b7b7e966f3862 (/dev/sdh1) []
 2. ONLINE   b0615b1176ef4f12bfdd34c115620249 (/dev/sdg1) []
 3. ONLINE   b1eb954b7a454f32bf0c54f252776c1d (/dev/sdf1) []
Stop the clusterware stack on this node
 crsctl stop crs -f
and start the clusteware stack on all nodes
crsctl start crs -nowait
This concludes steps for restoring OCR and vote disk on a 11.2.0.4 cluster when they are stored on block devices.

Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]
Bug 12543757 : UNABLE TO MOVE VOTING DISK FROM RAW DEVICES TO NFS

Related Posts
Restoring OCR due to ASM disk failures - 1
Restoring OCR due to ASM disk failures - 2
Restoring Vote disk due to ASM disk failures - 1
Restoring Vote disk due to ASM disk failures - 2
Restoring Vote disk due to ASM disk failures - 3
Restoring OCR & Vote disk due to ASM disk failures - 1
Restoring OCR & Vote disk due to ASM disk failures - 2
Restoring OCR & Vote disk due to ASM disk failures - 3

Monday, February 1, 2016

Installing 11.2.0.4 Standalone Server with ASM and Role Separation on RHEL 7

There are two earlier posts which shows the highlights of installing 11gR2 on standalone server with ASM and role separation on RHEL 6 and RHEL 5. This posts only mention the steps that are different from the earlier setups, it's not an extensive how to install guide.
One of the main issues that existed was the failure of root.sh on RHEL 7. This was due to the fact that startup scripts on RHEL 7 are created as service while root.sh was trying to use the pre-RHEL7 method of inittab entries
# /opt/app/oracle/product/11.2.0/grid/root.sh
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /opt/app/oracle/product/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node rhel7 successfully pinned.
Adding Clusterware entries to inittab
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2015-05-27 14:00:52.832:
[client(23215)]CRS-2101:The OLR was formatted using version 3.
2015-05-27 14:00:53.700:
[client(23242)]CRS-1001:The OCR was formatted using version 3.

ohasd failed to start at /opt/app/oracle/product/11.2.0/grid/crs/install/roothas.pl line 377,  line 4.
/opt/app/oracle/product/11.2.0/grid/perl/bin/perl -I/opt/app/oracle/product/11.2.0/grid/perl/lib -I/opt/app/oracle/product/11.2.0/grid/crs/install /opt/app/oracle/product/11.2.0/grid/crs/install/roothas.pl execution failed
Workaround was to manually create a service file and later on Oracle also created a MOS on this (1959008.1). This has now been fixed.
It is assumed that all necessary pre-reqs are completed and users have been created for role separate installation similar to RHEL 6 setup. The software used is the 13390677 patchset (11.2.0.4 software).
1. As grid user unzip the grid infrastructure software (p13390677_112040_Linux-x86-64_3of7.zip). Before the installation begins apply the patch 19404309. This is not a typical patch apply with OPatch, but a copying of files that eliminate the issues associated with installing 11.2.0.4 on RHEL 7.
2. Run cluster verify tool for high availability service option. The failure of pdksh could be ignored (refer 1962046.1). This invalidation is also visible on the OUI and could also be ignored similar to 11.2.0.3 on RHEL 6.
./runcluvfy.sh  stage -pre hacfg

Performing pre-checks for Oracle Restart configuration
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "rhel7:/tmp"
Check for multiple users with UID value 1002 passed
User existence check passed for "grid"
Group existence check passed for "oinstall"
Group existence check passed for "dba"
Membership check for user "grid" in group "oinstall" [as Primary] passed
Membership check for user "grid" in group "dba" passed
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make"
Package existence check passed for "binutils"
Package existence check passed for "gcc(x86_64)"
Package existence check passed for "libaio(x86_64)"
Package existence check passed for "glibc(x86_64)"
Package existence check passed for "compat-libstdc++-33(x86_64)"
Package existence check passed for "elfutils-libelf(x86_64)"
Package existence check passed for "elfutils-libelf-devel"
Package existence check passed for "glibc-common"
Package existence check passed for "glibc-devel(x86_64)"
Package existence check passed for "glibc-headers"
Package existence check passed for "gcc-c++(x86_64)"
Package existence check passed for "libaio-devel(x86_64)"
Package existence check passed for "libgcc(x86_64)"
Package existence check passed for "libstdc++(x86_64)"
Package existence check passed for "libstdc++-devel(x86_64)"
Package existence check passed for "sysstat"
Package existence check failed for "pdksh"
Check failed on nodes:
        rhel7
Package existence check passed for "expat(x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Starting check for consistency of primary group of root user

Check for consistency of root user's primary group passed

Pre-check for Oracle Restart configuration was unsuccessful.
3. Run the installer and proceed as before. However do not run the root.sh script when prompted.
Patch 18370031 must be applied before root.sh could be run. Refer the patch's readme on how to apply it before GI home is configured
$GI_HOME/OPatch/opatch napply -oh $GI_HOME -local ./18370031
Once the patch is applied run the root.sh. With the patch in place the root.sh correctly configures a ohas service on RHEL 7. There's no need to manually create a service file anymore (steps in 1959008.1 not needed).
# /opt/app/oracle/product/11.2.0/grid/root.sh

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /opt/app/oracle/product/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node rhel7 successfully pinned.
Adding Clusterware entries to oracle-ohasd.service

rhel7     2016/01/26 13:49:04     /opt/app/oracle/product/11.2.0/grid/cdata/rhel7/backup_20160126_134904.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server


4. Once root.sh has finished continue with the rest of the installation to completion.
crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rhel7
ora.FRA.dg
               ONLINE  ONLINE       rhel7
ora.LISTENER.lsnr
               ONLINE  ONLINE       rhel7
ora.asm
               ONLINE  ONLINE       rhel7                    Started
ora.ons
               OFFLINE OFFLINE      rhel7
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       rhel7
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  ONLINE       rhel7
5. To install database software extract the relevant zip files (p13390677_112040_Linux-x86-64_1of7.zip and p13390677_112040_Linux-x86-64_2of7.zip). Similar to GI installer, before installation could begin apply the patch 19404309. Once the patch is applied run the installer.
6. During the installation the following error could be observed, which was mentioned in an earlier post.
Ignore the error and continue the installation to completion. Once database software is installed apply the patch 19692824 before database is created.
7. As the last step create the database. There are no errors or issues related to RHEL 7 during the database creation using DBCA.
8. If a GI patch is applied using opatch auto option it would fail with the following
$ORACLE_HOME/OPatch/opatch auto ./21523375 -ocmrf ../ocm.rsp
Can't locate Switch.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /opt/app/oracle/product/11.2.0/grid/OPatch/crs/auto_patch.pl line 2730.
Reason is RHEL 7 has a newer version of perl
 perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 25 registered patches, see perl -V for more detail)
where switch module is deprecated. Patch could be applied manually or download and install the switch.pm module from cpan and try opatch auto again. Refer 1915430.1 for more.
cd Switch-2.17
perl Makefile.PL
make test
make install
Manifying blib/man3/Switch.3pm
Installing /usr/local/share/perl5/Switch.pm
Installing /usr/local/share/man/man3/Switch.3pm
Appending installation info to /usr/lib64/perl5/perllocal.pod
Afterwards the opatch auto continues without any issues.

Useful metalink notes
Opatch Auto fails with: Can't locate Switch.pm in @INC [ID 1915430.1]
Installation walkthrough - Oracle Grid/RAC 11.2.0.4 on Oracle Linux 7 [ID 1951613.1]
Install of Clusterware fails while running root.sh on OL7 - ohasd fails to start [ID 1959008.1]
Requirements for Installing Oracle 11.2.0.4 RDBMS on RHEL7 or OL7 64-bit (x86-64) [ID 1962100.1]
Missing pdksh-5.2.14 package during Oracle database 11.2.0.4 install on Oracle Linux 7 [ID 1962046.1]
Installation of Oracle 11.2.0.4 on OL7 fails with “undefined reference to symbol ‘B_DestroyKeyObject’” error [ID 1965691.1]

Related Posts
ASM for Standalone Server in 11gR2 with Role Separation (on RHEL 5)
Installing 11gR2 Standalone Server with ASM and Role Separation on RHEL 6
Installing 11gR2 (11.2.0.3) GI with Role Separation on RHEL 6
Installing 11gR2 (11.2.0.3) GI with Role Separation on OEL 6
Installing 11.2.0.3 on RHEL 6
ins_emagent.mk Related Error When Installing 11.2.0.4 Database on RHEL 7
Installing Oracle Database 12.1.0.2 on RHEL 7

Update on 2017-07-10
As per 2282371.1 and 2281492.1 certain RHEL 7 kernel versions (3.10.0-514.21.2.el7.x86_64) cause issues with regard to starting of OHASD in 11.2.0.4. Avoid these kernel versions when installing or upgrading OS when 11.2.0.4 is installed.

Useful metalink notes
OHASD Fails to Start With Kernel Version 3.10.0-514.21.2.el7.x86_64 [ID 2281492.1]
Grid Infrastructure Fails to Start OHASD With RedHat Linux or Oracle Linux with RedHat Compatible Kernel (RHCK) Version 3.10.0-514.21.2.EL7.X86_64 or Higher [ID 2282371.1]