Monday, April 14, 2014

Adding a Node to 12cR1 RAC

This post list the steps for adding a node to 12cR1 standard cluster (not to a flex cluster) which is similar to that of adding a node to 11gR2 RAC. Node addition is done in three phases. Phase one is to add the clusterware to the new node. Second phase will add the database software and final phase will extend the database to the new node by creating a new instance for it. It is possible to do the node additional in silent mode or in an interactive mode with the use of GUIs. This post uses the latter method (earlier post of 11gR2 used silent mode and steps for 12c are similar to that).
1. It is assumed that physical connections (shared storage connections, network) are made to the new node being added. The pre node add steps could be checked with cluvfy by executing the pre node add command from an existing node and passing the hostname of the new node (in this case rhel12c2 is the new node).
[grid@rhel12c1 ~]$ cluvfy stage -pre nodeadd -n rhel12c2

Performing pre-checks for node addition

Checking node reachability...
Node reachability check passed from node "rhel12c1"

Checking user equivalence...
User equivalence check passed for user "grid"
Package existence check passed for "cvuqdisk"

Checking CRS integrity...

CRS integrity check passed

Clusterware version consistency passed.

Checking shared resources...

Checking CRS home location...
Location check passed for: "/opt/app/12.1.0/grid"
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity using interfaces on subnet "192.168.0.0"
Node connectivity passed for subnet "192.168.0.0" with node(s) rhel12c1,rhel12c2
TCP connectivity check passed for subnet "192.168.0.0"

Check: Node connectivity using interfaces on subnet "192.168.1.0"
Node connectivity passed for subnet "192.168.1.0" with node(s) rhel12c1,rhel12c2
TCP connectivity check passed for subnet "192.168.1.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed.

Node connectivity check passed

Checking multicast communication...

Checking subnet "192.168.1.0" for multicast communication with multicast group "224.0.0.251"...
Check of subnet "192.168.1.0" for multicast communication with multicast group "224.0.0.251" passed.

Check of multicast communication passed.
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "rhel12c2:/usr,rhel12c2:/var,rhel12c2:/etc,rhel12c2:/opt/app/12.1.0/grid,rhel12c2:/sbin,rhel12c2:/tmp"
Free disk space check passed for "rhel12c1:/usr,rhel12c1:/var,rhel12c1:/etc,rhel12c1:/opt/app/12.1.0/grid,rhel12c1:/sbin,rhel12c1:/tmp"
Check for multiple users with UID value 501 passed
User existence check passed for "grid"
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "binutils"
Package existence check passed for "compat-libcap1"
Package existence check passed for "compat-libstdc++-33(x86_64)"
Package existence check passed for "libgcc(x86_64)"
Package existence check passed for "libstdc++(x86_64)"
Package existence check passed for "libstdc++-devel(x86_64)"
Package existence check passed for "sysstat"
Package existence check passed for "gcc"
Package existence check passed for "gcc-c++"
Package existence check passed for "ksh"
Package existence check passed for "make"
Package existence check passed for "glibc(x86_64)"
Package existence check passed for "glibc-devel(x86_64)"
Package existence check passed for "libaio(x86_64)"
Package existence check passed for "libaio-devel(x86_64)"
Package existence check passed for "nfs-utils"
Check for multiple users with UID value 0 passed
Current group ID check passed

Starting check for consistency of primary group of root user

Check for consistency of root user's primary group passed
Group existence check passed for "asmadmin"
Group existence check passed for "asmoper"
Group existence check passed for "asmdba"

Checking ASMLib configuration.
Check for ASMLib configuration passed.

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

User "grid" is not part of "root" group. Check passed
Checking integrity of file "/etc/resolv.conf" across nodes

"domain" and "search" entries do not coexist in any  "/etc/resolv.conf" file
All nodes have same "search" order defined in file "/etc/resolv.conf"
The DNS response time for an unreachable node is within acceptable limit on all nodes

Check for integrity of file "/etc/resolv.conf" passed

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed

Pre-check for node addition was successful.
2. To extend the cluster by installing clusterware on the new node run the addNode.sh in the $GI_HOME/addnode directory as grid user from an existing node. As mentioned earlier this post uses the interactive method to add the node.
Click add button and add the hostname and vip name of the new node

Fix any pre-req issues and click install to begin the GI installation on the new node.

Execute the root scripts on the new node

Output from root script execution
[root@rhel12c2 12.1.0]# /opt/app/12.1.0/grid/root.sh
Performing root user operation for Oracle 12c

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /opt/app/12.1.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /opt/app/12.1.0/grid/crs/install/crsconfig_params
2014/03/04 16:16:06 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
2014/03/04 16:16:43 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rhel12c2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rhel12c2'
CRS-2677: Stop of 'ora.drivers.acfs' on 'rhel12c2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rhel12c2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rhel12c2'
CRS-2672: Attempting to start 'ora.evmd' on 'rhel12c2'
CRS-2676: Start of 'ora.evmd' on 'rhel12c2' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rhel12c2'
CRS-2676: Start of 'ora.gpnpd' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rhel12c2'
CRS-2676: Start of 'ora.gipcd' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rhel12c2'
CRS-2676: Start of 'ora.cssdmonitor' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rhel12c2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rhel12c2'
CRS-2676: Start of 'ora.diskmon' on 'rhel12c2' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rhel12c2'
CRS-2676: Start of 'ora.cssd' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rhel12c2'
CRS-2672: Attempting to start 'ora.ctssd' on 'rhel12c2'
CRS-2676: Start of 'ora.ctssd' on 'rhel12c2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rhel12c2'
CRS-2676: Start of 'ora.asm' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rhel12c2'
CRS-2676: Start of 'ora.storage' on 'rhel12c2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rhel12c2'
CRS-2676: Start of 'ora.crsd' on 'rhel12c2' succeeded
CRS-6017: Processing resource auto-start for servers: rhel12c2
CRS-2672: Attempting to start 'ora.ons' on 'rhel12c2'
CRS-2676: Start of 'ora.ons' on 'rhel12c2' succeeded
CRS-6016: Resource auto-start has completed for server rhel12c2
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2014/03/04 16:22:11 CLSRSC-343: Successfully started Oracle clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2014/03/04 16:22:37 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded
This conclude the phase one. Next phase is to add the database software to new node.



3. To add database software run addNode.sh in the $ORACLE_HOME/addnode directory as oracle user from an existing node. When the OUI starts the new node comes selected by default.

At the end of the database software installation message is shown to invoke DBCA to extend the database to new node.

This conclude phase two
4. Final phase is to extend the database to the new node. Before invoking DBCA change the permission on the directory $ORACLE_BASE/admin to include write permission for the oinstall group so that oracle user is able to write into the directory. After the database software is installed the permission on this directory was as follows
[oracle@rhel12c2 oracle]$ ls -l
drwxr-xr-x. 3 grid   oinstall 4096 Mar  4 16:21 admin
Since oracle user doesn't have write permission (as oinstall group doesn't have write permission) the DBCA fails with the following.
Change permissions with
chmod 775 admin
and invoke the DBCA.
5. Select instance management from DBCA and then add instance.
Select which database to extend (if there are multiple databases in the cluster and confirm the new instance details (comes auto populated)

6. Check the instance is visible on the cluster
[oracle@rhel12c1 addnode]$ srvctl config database -d cdb12c
Database unique name: cdb12c
Database name: cdb12c
Oracle home: /opt/app/oracle/product/12.1.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/cdb12c/spfilecdb12c.ora
Password file: +DATA/cdb12c/orapwcdb12c
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: cdb12c
Database instances: cdb12c1,cdb12c2
Disk Groups: DATA,FLASH
Mount point paths:
Services: pdbsvc
Type: RAC
Start concurrency:
Stop concurrency:
Database is administrator managed

SQL> select inst_id,instance_name,host_name from gv$instance;

   INST_ID INSTANCE_NAME    HOST_NAME
---------- ---------------- -------------------
         1 cdb12c1          rhel12c1.domain.net
         2 cdb12c2          rhel12c2.domain.net


SQL> select con_id,name from gv$pdbs;

    CON_ID NAME
---------- ---------
         2 PDB$SEED
         3 PDB12C
         2 PDB$SEED
         3 PDB12C
The service created for this PDB is not yet available on the new node. As seen below only one instance appear as preferred instance and none on the available instance.
[oracle@rhel12c1 ~]$ srvctl config service -d cdb12c -s pdbsvc
Service name: pdbsvc
Service is enabled
Server pool: cdb12c_pdbsvc
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
TAF failover retries:
TAF failover delay:
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: pdb12c
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Session State Consistency:
Preferred instances: cdb12c1
Available instances:
Modify the service to include the instance on the newly added node as well
[oracle@rhel12c1 ~]$ srvctl modify service -db cdb12c -pdb pdb12c -s pdbsvc -modifyconfig -preferred "cdb12c1,cdb12c2"

[oracle@rhel12c1 ~]$ srvctl config service -d cdb12c -s pdbsvc
Service name: pdbsvc
Service is enabled
Server pool: cdb12c_pdbsvc
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
TAF failover retries:
TAF failover delay:
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: pdb12c
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Session State Consistency:
Preferred instances: cdb12c1,cdb12c2
Available instances:
7. Use cluvfy to perform the post node add checks
[grid@rhel12c1 ~]$ cluvfy stage -post nodeadd -n rhel12c2
With this concludes the addition of a new node to 12cR1 RAC.

Related Posts
Adding a Node to 11gR2 RAC
Adding a Node to 11gR1 RAC