The environment used in this case is a two node RAC with role separation(11.2.0.4). Under normal operation it has the following resources and status. (formatted status)
Resource Name Type Target State Host ------------- ------ ------- -------- ---------- ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m2 ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m2 ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m2 ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m1 ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m2 ora.MYLISTENER_SCAN1.lsnr ora.scan_listener.type ONLINE ONLINE rhel6m2 ora.asm ora.asm.type ONLINE ONLINE rhel6m1 ora.asm ora.asm.type ONLINE ONLINE rhel6m2 ora.cvu ora.cvu.type ONLINE ONLINE rhel6m2 ora.gsd ora.gsd.type OFFLINE OFFLINE ora.gsd ora.gsd.type OFFLINE OFFLINE ora.net1.network ora.network.type ONLINE ONLINE rhel6m1 ora.net1.network ora.network.type ONLINE ONLINE rhel6m2 ora.oc4j ora.oc4j.type ONLINE ONLINE rhel6m2 ora.ons ora.ons.type ONLINE ONLINE rhel6m1 ora.ons ora.ons.type ONLINE ONLINE rhel6m2 ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m1 ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m2 ora.rhel6m1.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m1 ora.rhel6m2.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m2 ora.scan1.vip ora.scan_vip.type ONLINE ONLINE rhel6m2 ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m1 ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m2 ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m1 ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m2 ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m2 ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m1After the node 2 (rhel6m2 node in this case) suffers a catastrophic failure, resources and status is as below. There are offline and failed over (vip) resources from rhel6m2.
Resource Name Type Target State Host ------------- ------ ------- -------- ---------- ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m1 ora.MYLISTENER_SCAN1.lsnr ora.scan_listener.type ONLINE ONLINE rhel6m1 ora.asm ora.asm.type ONLINE ONLINE rhel6m1 ora.cvu ora.cvu.type ONLINE ONLINE rhel6m1 ora.gsd ora.gsd.type OFFLINE OFFLINE ora.net1.network ora.network.type ONLINE ONLINE rhel6m1 ora.oc4j ora.oc4j.type ONLINE ONLINE rhel6m1 ora.ons ora.ons.type ONLINE ONLINE rhel6m1 ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m1 ora.rhel6m1.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m1 ora.rhel6m2.vip ora.cluster_vip_net1.type ONLINE INTERMEDIATE rhel6m1 ora.scan1.vip ora.scan_vip.type ONLINE ONLINE rhel6m1 ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m1 ora.std11g2.db ora.database.type ONLINE OFFLINE ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m1 ora.std11g2.myservice.svc ora.service.type ONLINE OFFLINE ora.std11g2.abx.domain.net.svc ora.service.type ONLINE OFFLINE ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m1Removing of resources of the failed node begins at database resource level. There are two services running and they both have the DB instance on the failed node as a preferred instance (output is condensed)
srvctl config service -d std11g2 Service name: myservice Service is enabled Server pool: std11g2_myservice Cardinality: 2 ... Preferred instances: std11g21,std11g22 Available instances: Service name: abx.domain.net Service is enabled Server pool: std11g2_abx.domain.net Cardinality: 2 ... Preferred instances: std11g21,std11g22 Available instances:Modify the service configuration so that only the surviving instances are set as preferred instances.
$ srvctl modify service -s myservice -d std11g2 -n -i std11g21 -f $ srvctl modify service -s abx.domain.net -d std11g2 -n -i std11g21 -f $ srvctl config service -d std11g2 Service name: myservice Service is enabled Server pool: std11g2_myservice Cardinality: 1 .. Preferred instances: std11g21 Available instances: Service name: abx.domain.net Service is enabled Server pool: std11g2_abx.domain.net Cardinality: 1 .. Preferred instances: std11g21 Available instances: $ srvctl status service -d std11g2 Service myservice is running on instance(s) std11g21 Service abx.domain.net is running on instance(s) std11g21Remove the database instance on the failed node
srvctl config database -d std11g2 Database unique name: std11g2 Database name: std11g2 ... Database instances: std11g21,std11g22 Disk Groups: DATA,FLASH Mount point paths: Services: myservice,abx.domain.net Type: RAC Database is administrator managedThis is done using DBCA's instance management option. If the listener has a non-default name and port then accessing the DB will fail with below message. To fix this create a default listener (name listener and port 1521). Also if VNCR is used then remove the failed node from the registration list. Proceed to instance deletion by selecting the inactive instance on the failed node. As node 2 is not available following warning will be issued. Click continue and proceed. During the execution various other warning will appear such as unable to remove /etc/oratab etc all of these could be ignored. However DBCA didn't run till end, at 67% (observed through repeated runs on this 11.2.0.4 environment) following dialog box appeared. As seen on the screenshot it has no message, just an OK button. Clicking it doesn't end the DBCA session but goes to the beginning and exit the DBCA clicking cancel afterwards. However this doesn't appear to be a failure on the DBCA to remove the instance. In fact instance is removed as subsequent instance operation only list the instance on the surviving node. Querying the database also shows that instance 2 (std11g22 in this case) related undo tablespace and redo logs have been removed and only surviving instance related undo tablespace and redo logs are available.
SQL> select name from v$tablespace; NAME ------------------------------ SYSTEM SYSAUX UNDOTBS1 TEMP USERS EXAMPLE TEST 7 rows selected. SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- 1 1 1598 52428800 512 2 NO CURRENT 68471125 07-JUL-16 2.8147E+14 2 1 1597 52428800 512 2 YES INACTIVE 68467762 07-JUL-16 68471125 07-JUL-16 srvctl config database -d std11g2 Database unique name: std11g2 Database name: std11g2 ... Database instances: std11g21 Disk Groups: DATA,FLASH Mount point paths: Services: myservice,abx.domain.net Type: RAC Database is administrator managedOnce the database resources are removed next step is to remove the Oracle database home entry for the failed node from the inventory.
As the node is unavailable, there's no un-installation involved. Run the inventory update command with surviving nodes. Inventory content for the Oracle home before the failed node is removed.
<HOME NAME="OraDb11g_home2" LOC="/opt/app/oracle/product/11.2.0/dbhome_4" TYPE="O" IDX="4"> <NODE_LIST> <NODE NAME="rhel6m1"/> <NODE NAME="rhel6m2"/> </NODE_LIST> </HOME>After the inventory update
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={rhel6m1}" Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed The inventory pointer is located at /etc/oraInst.loc The inventory is located at /opt/app/oraInventory 'UpdateNodeList' was successful. <HOME NAME="OraDb11g_home2" LOC="/opt/app/oracle/product/11.2.0/dbhome_4" TYPE="O" IDX="4"> <NODE_LIST> <NODE NAME="rhel6m1"/> </NODE_LIST> </HOME>Next step is to remove the cluster resources and the node itself. If any of the node is in pin stat, unpin them. In this case both nodes are unpinned
olsnodes -s -t rhel6m1 Active Unpinned rhel6m2 Inactive UnpinnedStop and remove the VIP resource of the failed node
# srvctl stop vip -i rhel6m2-vip -f # srvctl remove vip -i rhel6m2-vip -fRemove the failed node from the cluster configuration
# crsctl delete node -n rhel6m2 CRS-4661: Node rhel6m2 successfully deleted.Finally remove the grid home for the failed node from the inventory. Before inventory update
<HOME NAME="Ora11g_gridinfrahome2" LOC="/opt/app/11.2.0/grid4" TYPE="O" IDX="3" CRS="true"> <NODE_LIST> <NODE NAME="rhel6m1"/> <NODE NAME="rhel6m2"/> </NODE_LIST> </HOME>After inventory update
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={rhel6m1}" CRS=TRUE Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed The inventory pointer is located at /etc/oraInst.loc The inventory is located at /opt/app/oraInventory 'UpdateNodeList' was successful. <HOME NAME="Ora11g_gridinfrahome2" LOC="/opt/app/11.2.0/grid4" TYPE="O" IDX="3" CRS="true"> <NODE_LIST> <NODE NAME="rhel6m1"/> </NODE_LIST> </HOME>Validate the node removal with cluvfy
cluvfy stage -post nodedel -n rhel6m2 Performing post-checks for node removal Checking CRS integrity... Clusterware version consistency passed CRS integrity check passed Node removal check passed Post-check for node removal was successful.Remove the default listener if one was created during instance remove step. The final status of resource is as below.
Resource Name Type Target State Host ------------- ------ ------- -------- ---------- ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m1 ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m1 ora.MYLISTENER_SCAN1.lsnr ora.scan_listener.type ONLINE ONLINE rhel6m1 ora.asm ora.asm.type ONLINE ONLINE rhel6m1 ora.cvu ora.cvu.type ONLINE ONLINE rhel6m1 ora.gsd ora.gsd.type OFFLINE OFFLINE ora.net1.network ora.network.type ONLINE ONLINE rhel6m1 ora.oc4j ora.oc4j.type ONLINE ONLINE rhel6m1 ora.ons ora.ons.type ONLINE ONLINE rhel6m1 ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m1 ora.rhel6m1.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m1 ora.scan1.vip ora.scan_vip.type ONLINE ONLINE rhel6m1 ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m1 ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m1 ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m1Useful metalink notes
How to remove/delete a node from Grid Infrastructure Clusterware when the node has failed [ID 1262925.1]
Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up [ID 466975.1]
RAC on Windows: How to Remove a Node from a Cluster When the Node Crashes Due to OS/Hardware Failure and Cannot Boot [ID 832054.1]
Related Post
Deleting a Node From 12cR1 RAC
Deleting a Node From 11gR2 RAC
Deleting a 11gR1 RAC Node