Thursday, July 14, 2016

Removing a Failed Node From the Cluster

This post list steps for removing a failed node from a cluster. The steps differs from steps in the previous node deletion posts (11gR1,11gR2 and 12c) such that one node has suffered a catastrophic failure and is not available for any kind of command or script executions. Therefore all the activities involved in removing the failed node are executed from a surviving node.
The environment used in this case is a two node RAC with role separation(11.2.0.4). Under normal operation it has the following resources and status. (formatted status)
Resource Name                       Type                      Target             State              Host
-------------                       ------                    -------            --------           ----------
ora.CLUSTER_DG.dg                   ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.CLUSTER_DG.dg                   ora.diskgroup.type        ONLINE             ONLINE             rhel6m2
ora.DATA.dg                         ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.DATA.dg                         ora.diskgroup.type        ONLINE             ONLINE             rhel6m2
ora.FLASH.dg                        ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.FLASH.dg                        ora.diskgroup.type        ONLINE             ONLINE             rhel6m2
ora.MYLISTENER.lsnr                 ora.listener.type         ONLINE             ONLINE             rhel6m1
ora.MYLISTENER.lsnr                 ora.listener.type         ONLINE             ONLINE             rhel6m2
ora.MYLISTENER_SCAN1.lsnr           ora.scan_listener.type    ONLINE             ONLINE             rhel6m2
ora.asm                             ora.asm.type              ONLINE             ONLINE             rhel6m1
ora.asm                             ora.asm.type              ONLINE             ONLINE             rhel6m2
ora.cvu                             ora.cvu.type              ONLINE             ONLINE             rhel6m2
ora.gsd                             ora.gsd.type              OFFLINE            OFFLINE
ora.gsd                             ora.gsd.type              OFFLINE            OFFLINE
ora.net1.network                    ora.network.type          ONLINE             ONLINE             rhel6m1
ora.net1.network                    ora.network.type          ONLINE             ONLINE             rhel6m2
ora.oc4j                            ora.oc4j.type             ONLINE             ONLINE             rhel6m2
ora.ons                             ora.ons.type              ONLINE             ONLINE             rhel6m1
ora.ons                             ora.ons.type              ONLINE             ONLINE             rhel6m2
ora.registry.acfs                   ora.registry.acfs.type    ONLINE             ONLINE             rhel6m1
ora.registry.acfs                   ora.registry.acfs.type    ONLINE             ONLINE             rhel6m2
ora.rhel6m1.vip                     ora.cluster_vip_net1.type ONLINE             ONLINE             rhel6m1
ora.rhel6m2.vip                     ora.cluster_vip_net1.type ONLINE             ONLINE             rhel6m2
ora.scan1.vip                       ora.scan_vip.type         ONLINE             ONLINE             rhel6m2
ora.std11g2.db                      ora.database.type         ONLINE             ONLINE             rhel6m1
ora.std11g2.db                      ora.database.type         ONLINE             ONLINE             rhel6m2
ora.std11g2.myservice.svc           ora.service.type          ONLINE             ONLINE             rhel6m1
ora.std11g2.myservice.svc           ora.service.type          ONLINE             ONLINE             rhel6m2
ora.std11g2.abx.domain.net.svc      ora.service.type          ONLINE             ONLINE             rhel6m2
ora.std11g2.abx.domain.net.svc      ora.service.type          ONLINE             ONLINE             rhel6m1
After the node 2 (rhel6m2 node in this case) suffers a catastrophic failure, resources and status is as below. There are offline and failed over (vip) resources from rhel6m2.
Resource Name                       Type                      Target             State              Host
-------------                       ------                    -------            --------           ----------
ora.CLUSTER_DG.dg                   ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.DATA.dg                         ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.FLASH.dg                        ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.MYLISTENER.lsnr                 ora.listener.type         ONLINE             ONLINE             rhel6m1
ora.MYLISTENER_SCAN1.lsnr           ora.scan_listener.type    ONLINE             ONLINE             rhel6m1
ora.asm                             ora.asm.type              ONLINE             ONLINE             rhel6m1
ora.cvu                             ora.cvu.type              ONLINE             ONLINE             rhel6m1
ora.gsd                             ora.gsd.type              OFFLINE            OFFLINE
ora.net1.network                    ora.network.type          ONLINE             ONLINE             rhel6m1
ora.oc4j                            ora.oc4j.type             ONLINE             ONLINE             rhel6m1
ora.ons                             ora.ons.type              ONLINE             ONLINE             rhel6m1
ora.registry.acfs                   ora.registry.acfs.type    ONLINE             ONLINE             rhel6m1
ora.rhel6m1.vip                     ora.cluster_vip_net1.type ONLINE             ONLINE             rhel6m1
ora.rhel6m2.vip                     ora.cluster_vip_net1.type ONLINE             INTERMEDIATE       rhel6m1
ora.scan1.vip                       ora.scan_vip.type         ONLINE             ONLINE             rhel6m1
ora.std11g2.db                      ora.database.type         ONLINE             ONLINE             rhel6m1
ora.std11g2.db                      ora.database.type         ONLINE             OFFLINE
ora.std11g2.myservice.svc           ora.service.type          ONLINE             ONLINE             rhel6m1
ora.std11g2.myservice.svc           ora.service.type          ONLINE             OFFLINE
ora.std11g2.abx.domain.net.svc      ora.service.type          ONLINE             OFFLINE
ora.std11g2.abx.domain.net.svc      ora.service.type          ONLINE             ONLINE             rhel6m1
Removing of resources of the failed node begins at database resource level. There are two services running and they both have the DB instance on the failed node as a preferred instance (output is condensed)
srvctl config service -d std11g2
Service name: myservice
Service is enabled
Server pool: std11g2_myservice
Cardinality: 2
...
Preferred instances: std11g21,std11g22
Available instances:
Service name: abx.domain.net
Service is enabled
Server pool: std11g2_abx.domain.net
Cardinality: 2
...
Preferred instances: std11g21,std11g22
Available instances:
Modify the service configuration so that only the surviving instances are set as preferred instances.
$ srvctl modify service -s myservice -d std11g2 -n -i std11g21 -f
$ srvctl modify service -s abx.domain.net -d std11g2 -n -i std11g21 -f

$ srvctl config service -d std11g2
Service name: myservice
Service is enabled
Server pool: std11g2_myservice
Cardinality: 1
..
Preferred instances: std11g21
Available instances:
Service name: abx.domain.net
Service is enabled
Server pool: std11g2_abx.domain.net
Cardinality: 1
..
Preferred instances: std11g21
Available instances:

$ srvctl status service -d std11g2
Service myservice is running on instance(s) std11g21
Service abx.domain.net is running on instance(s) std11g21
Remove the database instance on the failed node
srvctl config database -d std11g2
Database unique name: std11g2
Database name: std11g2
...
Database instances: std11g21,std11g22
Disk Groups: DATA,FLASH
Mount point paths:
Services: myservice,abx.domain.net
Type: RAC
Database is administrator managed
This is done using DBCA's instance management option. If the listener has a non-default name and port then accessing the DB will fail with below message.
To fix this create a default listener (name listener and port 1521). Also if VNCR is used then remove the failed node from the registration list. Proceed to instance deletion by selecting the inactive instance on the failed node.
As node 2 is not available following warning will be issued. Click continue and proceed. During the execution various other warning will appear such as unable to remove /etc/oratab etc all of these could be ignored.
However DBCA didn't run till end, at 67% (observed through repeated runs on this 11.2.0.4 environment) following dialog box appeared. As seen on the screenshot it has no message, just an OK button. Clicking it doesn't end the DBCA session but goes to the beginning and exit the DBCA clicking cancel afterwards.
However this doesn't appear to be a failure on the DBCA to remove the instance. In fact instance is removed as subsequent instance operation only list the instance on the surviving node.
Querying the database also shows that instance 2 (std11g22 in this case) related undo tablespace and redo logs have been removed and only surviving instance related undo tablespace and redo logs are available.
SQL> select name from v$tablespace;

NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
EXAMPLE
TEST

7 rows selected.

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ ---------
         1          1       1598   52428800        512          2 NO  CURRENT               68471125 07-JUL-16   2.8147E+14
         2          1       1597   52428800        512          2 YES INACTIVE              68467762 07-JUL-16     68471125 07-JUL-16

srvctl config database -d std11g2
Database unique name: std11g2
Database name: std11g2
...
Database instances: std11g21
Disk Groups: DATA,FLASH
Mount point paths:
Services: myservice,abx.domain.net
Type: RAC
Database is administrator managed
Once the database resources are removed next step is to remove the Oracle database home entry for the failed node from the inventory.



As the node is unavailable, there's no un-installation involved. Run the inventory update command with surviving nodes. Inventory content for the Oracle home before the failed node is removed.
<HOME NAME="OraDb11g_home2" LOC="/opt/app/oracle/product/11.2.0/dbhome_4" TYPE="O" IDX="4">
   <NODE_LIST>
      <NODE NAME="rhel6m1"/>
      <NODE NAME="rhel6m2"/>
   </NODE_LIST>
</HOME>
After the inventory update
 ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={rhel6m1}"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 4095 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oraInventory
'UpdateNodeList' was successful.

<HOME NAME="OraDb11g_home2" LOC="/opt/app/oracle/product/11.2.0/dbhome_4" TYPE="O" IDX="4">
   <NODE_LIST>
      <NODE NAME="rhel6m1"/>
   </NODE_LIST>
</HOME>
Next step is to remove the cluster resources and the node itself. If any of the node is in pin stat, unpin them. In this case both nodes are unpinned
olsnodes -s -t
rhel6m1 Active  Unpinned
rhel6m2 Inactive        Unpinned
Stop and remove the VIP resource of the failed node
# srvctl stop  vip -i rhel6m2-vip -f
# srvctl remove vip -i rhel6m2-vip -f
Remove the failed node from the cluster configuration
#  crsctl delete node -n rhel6m2
CRS-4661: Node rhel6m2 successfully deleted.
Finally remove the grid home for the failed node from the inventory. Before inventory update
<HOME NAME="Ora11g_gridinfrahome2" LOC="/opt/app/11.2.0/grid4" TYPE="O" IDX="3" CRS="true">
   <NODE_LIST>
      <NODE NAME="rhel6m1"/>
      <NODE NAME="rhel6m2"/>
   </NODE_LIST>
</HOME>
After inventory update
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={rhel6m1}" CRS=TRUE
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 4095 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oraInventory
'UpdateNodeList' was successful.

<HOME NAME="Ora11g_gridinfrahome2" LOC="/opt/app/11.2.0/grid4" TYPE="O" IDX="3" CRS="true">
   <NODE_LIST>
      <NODE NAME="rhel6m1"/>
   </NODE_LIST>
</HOME>
Validate the node removal with cluvfy
cluvfy stage -post  nodedel -n rhel6m2

Performing post-checks for node removal
Checking CRS integrity...
Clusterware version consistency passed
CRS integrity check passed
Node removal check passed
Post-check for node removal was successful.
Remove the default listener if one was created during instance remove step. The final status of resource is as below.
Resource Name                       Type                      Target             State              Host
-------------                       ------                    -------            --------           ----------
ora.CLUSTER_DG.dg                   ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.DATA.dg                         ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.FLASH.dg                        ora.diskgroup.type        ONLINE             ONLINE             rhel6m1
ora.MYLISTENER.lsnr                 ora.listener.type         ONLINE             ONLINE             rhel6m1
ora.MYLISTENER_SCAN1.lsnr           ora.scan_listener.type    ONLINE             ONLINE             rhel6m1
ora.asm                             ora.asm.type              ONLINE             ONLINE             rhel6m1
ora.cvu                             ora.cvu.type              ONLINE             ONLINE             rhel6m1
ora.gsd                             ora.gsd.type              OFFLINE            OFFLINE
ora.net1.network                    ora.network.type          ONLINE             ONLINE             rhel6m1
ora.oc4j                            ora.oc4j.type             ONLINE             ONLINE             rhel6m1
ora.ons                             ora.ons.type              ONLINE             ONLINE             rhel6m1
ora.registry.acfs                   ora.registry.acfs.type    ONLINE             ONLINE             rhel6m1
ora.rhel6m1.vip                     ora.cluster_vip_net1.type ONLINE             ONLINE             rhel6m1
ora.scan1.vip                       ora.scan_vip.type         ONLINE             ONLINE             rhel6m1
ora.std11g2.db                      ora.database.type         ONLINE             ONLINE             rhel6m1
ora.std11g2.myservice.svc           ora.service.type          ONLINE             ONLINE             rhel6m1
ora.std11g2.abx.domain.net.svc      ora.service.type          ONLINE             ONLINE             rhel6m1
Useful metalink notes
How to remove/delete a node from Grid Infrastructure Clusterware when the node has failed [ID 1262925.1]
Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up [ID 466975.1]
RAC on Windows: How to Remove a Node from a Cluster When the Node Crashes Due to OS/Hardware Failure and Cannot Boot [ID 832054.1]

Related Post
Deleting a Node From 12cR1 RAC
Deleting a Node From 11gR2 RAC
Deleting a 11gR1 RAC Node

Friday, July 1, 2016

Moving Role Separated Oracle Restart Setup to Single User Setup

Oracle role separation setup involves installing the grid infrastructure and oracle database software as different users. Commonly GI is installed as grid user and oracle database software as oracle user. On rare occasions this role separation could result in behavior that is not experienced when both GI and Oracle DB is installed using a single user. Issues with secure password store and external tables are just two examples.
This post shows steps to moving from a role separated Oracle restart setup to a single user setup.
The way to do this is to remove the GI software installed with grid user and reinstall it with oracle user. However it is not possible to remove the the existing GI home while oracle database home is configured on it.
[grid@rhel7 deinstall]$ ./deinstall
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /opt/app/oraInventory/logs/
...
Traces log file: /opt/app/oraInventory/logs//crsdc_2016-02-24_07-28-15PM.log
ERROR: Can't deconfigure Oracle Restart before removing or downgrading managed Oracle databases.
So the option is to manually remove the grid home. This oracle restart environment used here is 12.1.0.2 on RHEL 7
1. Create a pfile from the ASM spfile
SQL> create pfile='/home/grid/asmpfile.ora' from spfile;

[grid@rhel7 ~]$ more asmpfile.ora
+ASM.__oracle_base='/opt/app/oracle'#ORACLE_BASE set from in memory value
+ASM.asm_diskgroups='FRA'#Manual Mount
*.asm_diskstring='/dev/sd*'
*.asm_power_limit=1
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
2. Backup the oratab file

3. Stop the HAS service
crsctl stop has
4. As root user, delete the GI home installed with grid user
[root@rhel7 12.1.0]# pwd
/opt/app/oracle/product/12.1.0
[root@rhel7 12.1.0]# rm -rf grid
5. Remove other HAS related files
cd /etc
rm -rf ora*
find . -name *ohasd -exec rm {} \;

rm /etc/systemd/system/oracle-ohasd.service
rm /etc/init.d/*ohasd
rm -rf /opt/ORCLfmap
rm -rf /var/tmp/.oracle
Removing of "/var/tmp/.oracle" is important without removing it, trying to reinstall will fail. More on 1997268.1
2016/02/24 20:04:38 CLSRSC-318: Failed to start Oracle OHASD service
Died at /opt/app/oracle/product/12.1.0/grid_2/crs/install/crsinstall.pm line 3041.
6. Backup the existing oraInventory and create a new oraInventory directory as oracle user
drwxrwx---.  6 grid   oinstall   97 Feb  5 19:38 oraInventory

# mv oraInventory oraInventorybak
# mkdir oraInventory
# chown oracle:oinstall oraInventory
7. Change udev file so that owner is the oracle user
OWNER="oracle", GROUP="asmadmin", MODE="0660"
8. Remove grid user owned GI related folders from ORACLE_BASE
 cd $ORACLE_BASE
# ls -l
drwxrwx---.  4 grid   oinstall   29 Sep 29 16:58 admin
drwxr-x---.  3 oracle oinstall   18 Sep 29 17:26 audit
drwxrwxr-x.  6 grid   oinstall   56 Sep 30 15:38 cfgtoollogs
drwxr-xr-x.  2 grid   oinstall    6 Sep 29 16:43 checkpoints
drwxrwx---.  4 grid   oinstall   32 Sep 29 16:39 crsdata
drwxr-x---. 19 grid   oinstall 4096 Sep 29 16:38 diag
drwxr-xr-x.  3 grid   oinstall   17 Feb  4 17:44 log
drwxrwxr-x.  3 grid   oinstall   19 Sep 29 16:33 product
drwxr-xr-x.  3 grid   oinstall   24 Sep 29 16:39 rhel7

rm -rf admin/+ASM
rm -rf cfgtoollogs
rm -rf checkpoints
rm -rf crsdata
rm -rf diag
rm -rf log
rm -rf rhel7/

# ls -l

drwxrwx---. 3 grid   oinstall 18 Feb 24 19:39 admin
drwxr-x---. 3 oracle oinstall 18 Sep 29 17:26 audit
drwxrwxr-x. 3 grid   oinstall 19 Sep 29 16:33 product
Change the ownership of the remaining files to oracle user
chown oracle:oinstall product/12.1.0
chown oracle:oinstall product
chown oracle:oinstall admin

# ls -l

drwxrwx---. 3 oracle oinstall 18 Feb 24 19:39 admin
drwxr-x---. 3 oracle oinstall 18 Sep 29 17:26 audit
drwxrwxr-x. 3 oracle oinstall 19 Sep 29 16:33 product
9. Reboot the server. Once restarted verify the permission on the block devices used for ASM
ls -l /dev/sd*
brw-rw----. 1 oracle asmadmin 8, 17 Feb 24 19:42 /dev/sdb1
brw-rw----. 1 oracle asmadmin 8, 33 Feb 24 19:42 /dev/sdc1
10. Add to oracle user asm related os groups. The final group list for oracle user is as below
id oracle
uid=1001(oracle) gid=1001(oinstall) groups=1001(oinstall),1002(dba),1003(oper),1005(asmoper),1006(asmdba),1007(asmadmin),1008(backupdba),1009(dgdba),1010(kmdba)
11. Install the GI software selecting "software only" option.

12. Run the root scripts when prompted
# /opt/app/oraInventory/orainstRoot.sh
Changing permissions of /opt/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /opt/app/oraInventory to oinstall.
The execution of the script is complete.

# /opt/app/oracle/product/12.1.0/grid_2/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /opt/app/oracle/product/12.1.0/grid_2

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/opt/app/oracle/product/12.1.0/grid_2/perl/bin/perl -I/opt/app/oracle/product/12.1.0/grid_2/perl/lib -I/opt/app/oracle/product/12.1.0/grid_2/crs/install /opt/app/oracle/product/12.1.0/grid_2/crs/install/roothas.pl

To configure Grid Infrastructure for a Cluster execute the following command as oracle user:
/opt/app/oracle/product/12.1.0/grid_2/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.
Run the command to configure Oracle restart.
# /opt/app/oracle/product/12.1.0/grid_2/perl/bin/perl -I/opt/app/oracle/product/12.1.0/grid_2/perl/lib -I/opt/app/oracle/product/12.1.0/grid_2/crs/install /opt/app/oracle/product/12.1.0/grid_2/crs/install/roothas.pl
Using configuration parameter file: /opt/app/oracle/product/12.1.0/grid_2/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node rhel7 successfully pinned.
2016/03/01 18:39:59 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

rhel7     2016/03/01 18:40:32     /opt/app/oracle/product/12.1.0/grid_2/cdata/rhel7/backup_20160301_184032.olr     0
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rhel7'
CRS-2673: Attempting to stop 'ora.evmd' on 'rhel7'
CRS-2677: Stop of 'ora.evmd' on 'rhel7' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rhel7' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2016/03/01 18:40:49 CLSRSC-327: Successfully configured Oracle Restart for a standalone server

crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
               OFFLINE OFFLINE      rhel7                    STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       rhel7                    STABLE
--------------------------------------------------------------------------------


13. Add ASM home entry to the /etc/oratab file
+ASM:/opt/app/oracle/product/12.1.0/grid_2:N
14. Add listener and ASM to HAS configuration
$ srvctl add listener -l listener -o /opt/app/oracle/product/12.1.0/grid_2 -p 1521
$ srvctl start listener -l listener
$ srvctl status listener -l listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): rhel7

$ srvctl add asm -l listener  -d "/dev/sd*"
$ srvctl start asm
$ srvctl config asm
ASM home: 
Password file:
ASM listener: LISTENER
Spfile:
ASM diskgroup discovery string: /dev/sd*

[oracle@rhel7 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       rhel7                    STABLE
ora.asm
               ONLINE  ONLINE       rhel7                    Started,STABLE
ora.ons
               OFFLINE OFFLINE      rhel7                    STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       rhel7                    STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       rhel7                    STABLE
--------------------------------------------------------------------------------
15. Modify the inventory for GI Home to include crs=true. The inventory.xml content before and after the modification
<HOME_LIST>
<HOME NAME="OraGI12Home1" LOC="/opt/app/oracle/product/12.1.0/grid_2" TYPE="O" IDX="1"/>
</HOME_LIST>

./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/product/12.1.0/grid_2 CRS=TRUE -silent

<HOME_LIST>
<HOME NAME="OraGI12Home1" LOC="/opt/app/oracle/product/12.1.0/grid_2" TYPE="O" IDX="1" CRS="true"/>
</HOME_LIST>
16. Patch the GI home to the same level as the Oracle database home.

17. Attach the Oracle database home
./runInstaller -silent -attachHome ORACLE_HOME=$ORACLE_HOME ORACLE_HOME_NAME="oracle_home"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 4097 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
'AttachHome' was successful.
Inventory content before and after oracle home attached
Before 
<HOME_LIST>
<HOME NAME="OraGI12Home1" LOC="/opt/app/oracle/product/12.1.0/grid" TYPE="O" IDX="1" CRS="true"/>
</HOME_LIST>

After
<HOME_LIST>
<HOME NAME="OraGI12Home1" LOC="/opt/app/oracle/product/12.1.0/grid_2" TYPE="O" IDX="1" CRS="true"/>
<HOME NAME="oracle_home" LOC="/opt/app/oracle/product/12.1.0/dbhome_2" TYPE="O" IDX="2"/>
</HOME_LIST>
18. Mount the ASM disk groups
SQL>  select name,state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
FRA                            DISMOUNTED
DATA                           DISMOUNTED

alter diskgroup data mount;
alter diskgroup fra mount;
19. If does not exists create spfile alias and add the database to HAS configuration. Also db entry to oratab file.
mkalias parameterfile/spfile.266.891711477 spfilese2db.ora

srvctl add database -d se2db -o /opt/app/oracle/product/12.1.0/dbhome_2 -p +DATA/se2db/spfilese2db.ora -a "data,fra"
srvctl start database -d se2db
20. Add auto start options to the components
crsctl modify resource ora.DATA.dg -attr "AUTO_START"="always" -unsupported
crsctl modify resource ora.FRA.dg -attr "AUTO_START"="always"    -unsupported
crsctl modify resource ora.LISTENER.lsnr -attr "AUTO_START"="always" -unsupported
crsctl modify resource ora.se2db.db -attr "AUTO_START"="always"  -unsupported
21. The HAS stack should be now up and running
$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rhel7                    STABLE
ora.FRA.dg
               ONLINE  ONLINE       rhel7                    STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rhel7                    STABLE
ora.asm
               ONLINE  ONLINE       rhel7                    Started,STABLE
ora.ons
               OFFLINE OFFLINE      rhel7                    STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       rhel7                    STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       rhel7                    STABLE
ora.se2db.db
      1        ONLINE  ONLINE       rhel7                    Open,STABLE
--------------------------------------------------------------------------------
This concludes the steps for moving a role separated oracle restart setup to a single user setup.

Useful metalink notes
Oracle Restart: 12.1 grid installation root.sh fails while starting ohasd [ID 1997268.1]

Related Posts
Recover From a Clusterware Home Deletion
Moving non-RAC Database and ASM Between Servers
Changing The Cluster Name

Wednesday, June 1, 2016

Secure External Password Store and Oracle Restart with Role Separation

Oracle secure external password store allows storing of password needed for database connection in a client side Oracle wallet. It allows password-less connection to the database and useful in running scripts without having to put the password in them. A secure external password store could be setup as below.
Create a wallet (usually in TNS_ADMIN but could be different location as well)
cd $ORACLE_HOME/network/admin/

orapki wallet create -wallet . -pwd "welcome1" -auto_login_local
Oracle PKI Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.

[oracle@rhel7 admin]$ ls
cwallet.sso  ewallet.p12  samples  shrept.lst  tnsnames.ora
The option "auto_login_local" prevents the wallet being moved to another host and used (1114599.1). It also ties the wallet to the operating system user preventing other users in the same host using the wallet as well (1505040.1). Create a credential, in this case std11g2 is a TNS entry that exists in tnsnames.ora file and asanga and asa are username and password
mkstore -wrl . -createCredential std11g2 asanga asa
Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.

Enter wallet password:
Create credential oracle.security.client.connect_string1
Add the wallet location to sqlnet.ora file and set WALLET_OVERRIDE to true which makes user names and passwords from the wallet to be used for authentication.
more sqlnet.ora
WALLET_LOCATION =
    (SOURCE =
       (METHOD = FILE)
       (METHOD_DATA = (DIRECTORY = /opt/app/oracle/product/11.2.0/dbhome_4/network/admin))
)

SQLNET.WALLET_OVERRIDE = TRUE
The existing credentials could be listed with the following command
mkstore -wrl . -listCredential
Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.

Enter wallet password:
List credential (index: connect_string username)
1: std11g2 asanga
With the external secure password store now setup, one could connected to the database without specifying the password as show below
[oracle@rhel7 orawallet]$ sqlplus  /@std11g2

SQL> show user
USER is "ASANGA"
Problem is that when secure external password store is setup for a Oracle restart (Single instance with ASM) database that has role separation (GI installed grid user and DB installed as oracle), the database fails to start during restarts and when srvctl is used. Below for a 11.2.0.4 setup
srvctl start database -d std11g2
PRCR-1079 : Failed to start resource ora.std11g2.db
CRS-5017: The resource action "ora.std11g2.db start" encountered the following error:
ORA-01078: failure in processing system parameters
. For details refer to "(:CLSN00107:)" in "/opt/app/oracle/product/11.2.0/grid_4/log/rhel6m1/agent/ohasd/oraagent_grid//oraagent_grid.log".

CRS-2674: Start of 'ora.std11g2.db' on 'rhel6m1' failed
Below for a 12.1.0.2 setup
srvctl start database -d se2db
PRCR-1079 : Failed to start resource ora.se2db.db
CRS-5017: The resource action "ora.se2db.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/se2db/spfilese2db.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/se2db/spfilese2db.ora
ORA-12578: TNS:wallet open failed
. For details refer to "(:CLSN00107:)" in "/opt/app/oracle/diag/crs/rhel7/crs/trace/ohasd_oraagent_grid.trc".

CRS-2674: Start of 'ora.se2db.db' on 'rhel7' failed
MOS notes 1612712.1,1383938.1 says to set the permission to 750 for this issue but it didn't work.



During further investigation of the issue following observations were made.
1. The problem doesn't happen when both grid home and oracle home are installed under one user (usually as oracle user).
2. Problem doesn't happen in RAC environments even if grid home and oracle home are installed as different users (grid home as grid user and oracle home as oracle user)
3. Problem only happens in single instance environments (tested on 12c and 11.2.0.4) where grid home is owned by grid user and oracle home is owned by oracle user.
Looking at the trace file shows that srvctl fails to connect to idle instance to start the database
2016-03-30 11:11:12.474778 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] clsnInstConnection::makeConnectStr UsrOraEnv ,ORACLE_BASE= m_oracleHome /opt/app/oracle/product/12.1.0/dbhome_2 Crshome /opt/app/oracle/product/12.1.0/grid
2016-03-30 11:11:12.474822 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/opt/app/oracle/product/12.1.0/dbhome_2/bin/oracle)(ARGV0=oraclese2db)(ENVS='ORACLE_HOME=/opt/app/oracle/product/12.1.0/dbhome_2,ORACLE_SID=se2db,LD_LIBRARY_PATH=,ORACLE_BASE=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(CONNECT_DATA=(SID=se2db))))
2016-03-30 11:11:12.475460 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] InstAgent::stop non pool pConnxn 1 9434f9e0
2016-03-30 11:11:12.475519 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] InstConnection::connectInt: server not attached
2016-03-30 11:11:13.498240 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] InstConnection:connectInt connected
2016-03-30 11:11:13.498315 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] InstConnection::shutdown mode 4
2016-03-30 11:11:13.499364 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0

2016-03-30 11:11:13.499453 :CLSDYNAM:2930841344: [ora.se2db.db]{0:0:45668} [start] InstConnection::disassociateEdition OCI error 1034
Then the start of the database was tested a wallet created without the "local" option. As mentioned earlier local option prevents wallet being used in another host and is tied to the OS user. On 11.2.0.4
orapki wallet create -wallet . -pwd "welcome1" -auto_login
Oracle PKI Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.

srvctl start database -d std11g2
srvctl status database -d std11g2
Database is running.
On 12.1.0.2
orapki wallet create -wallet . -pwd "welcome1" -auto_login
Oracle PKI Tool : Version 12.1.0.2
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

srvctl start database -d se2db
srvctl status database -d se2db
Database is running.
So it appears the problem seem to be use of auto_login_local since there's no issue when using auto_login. Since wallet wasn't moved to a different server, the only reason that could prevent DB from starting with a wallet created with auto_login_local option is if the starting command is executed by a user other than the user tied to the wallet. Right now the wallet has been created as Oracle user and srvctl start was executed as oracle user. So it appears that grid user gets involved in starting the DB even when srvctl is run as oracle user (why the same is not happening in RAC role separated config is under SR)
So the solution in this case was to create an empty wallet as grid user and set permission to 750.
[grid@rhel7 wallet]$ orapki wallet create -wallet . -pwd "welcome1" -auto_login_local
[grid@rhel7 wallet]$ chmod 750 cwallet.sso ewallet.p12
[grid@rhel7 wallet]$ ls -lrt
-rw-rw-rw-. 1 grid   oinstall     0 Apr 15 17:35 ewallet.p12.lck
-rwxr-x---. 1 grid   oinstall    75 Apr 15 17:35 ewallet.p12
-rw-rw-rw-. 1 grid   oinstall     0 Apr 15 17:35 cwallet.sso.lck
-rwxr-x---. 1 grid   oinstall   120 Apr 15 17:35 cwallet.sso
No credentials are added to this wallet as such nothing to list
[grid@rhel7 wallet]$  mkstore -wrl . -listCredential
Oracle Secret Store Tool : Version 12.1.0.2
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Enter wallet password:
List credential (index: connect_string username)
Used a non-default location to this wallet file and this was added to $ORACLE_HOME/network/admin/sqlnet.ora file(not to $GI_HOME/network/admin/sqlnet.ora)
pwd
/opt/app/oracle/product/12.1.0/dbhome_2/network/admin

more sqlnet.ora
WALLET_LOCATION =
    (SOURCE =
       (METHOD = FILE)
       (METHOD_DATA = (DIRECTORY = /usr/local/wallet))
)

SQLNET.WALLET_OVERRIDE = TRUE
With this setup in place srvctl works fine and database get started automatically after restarts. However password-less access fails as the wallet is owned by grid user
[oracle@rhel7 orawallet]$ sqlplus  /@se2db

ERROR:
ORA-12578: TNS:wallet open failed
To fix this, create another wallet at a different location as Oracle user.
ls /usr/local/wallet/orawallet

-rwxrwx---. 1 oracle oinstall 528 Apr 15 17:20 ewallet.p12
-rwxrwx---. 1 oracle oinstall 573 Apr 15 17:20 cwallet.sso
Create another sqlnet.ora file in the same directory (orawallet). Add this directory path as the wallet location
WALLET_LOCATION =
    (SOURCE =
       (METHOD = FILE)
       (METHOD_DATA = (DIRECTORY = /usr/local/wallet/orawallet))
)

SQLNET.WALLET_OVERRIDE = TRUE
Whenever password-less access using external password store is needed export this location as the TNS_ADMIN and connect.
export TNS_ADMIN=/usr/local/wallet/orawallet
sqlplus  /@se2db

SQL> show user
USER is "ASANGA"
If any database scripts require password-less access before running export the TNS_ADMIN to location where second sqlnet.ora file was set.
Why the auto_login_local fails this way in Oracle restart with role separation is still under SR.

Useful metalink notes
Using a Secure External Password Store with the JDBC Thin Driver [ID 1441745.1]
Using The Secure External Password Store [ID 340559.1]
How To Avoid Expdp And Impdp Passwords Being Visible By "ps" Unix Command? [ID 869825.1]
How To Prevent The Secure Password Store Wallet From Being Moved to Another Host [ID 1114599.1]
Oracle Cluster failed to start with ASM instance getting ORA-00443 [ID 2000868.1]
Database Failed To Start [ID 1922401.1]
Oracle Restart: srvctl fails to start database with error CRS-5010 if RDBMS and Grid under different users [ID 1335607.1]
How to configure SEPS for the pluggable databases [ID 1980698.1]
Bug 11706168 - ORA-00000 during STARTUP with SQLNET.WALLET_OVERRIDE=TRUE [ID 11706168.8]
How To Configure The Secure External Password Store To Allow The Connection To RMAN Catalog? [ID 1383938.1]
ORA-15055 and ORA-12578 on database startup with external wallet store [ID 1612712.1]
How To Check Whether The Wallet Is A Local Auto Login Wallet [ID 1505040.1]