Sunday, July 1, 2018

JDBC Client Failover in Data Guard Configuration with PDBs

This post gives the highlights of setting up JDBC client failover in a data guard configuration with PDBs. For comprehensive set of steps refer the following white papers for 12c and for 11g.
The post shows how JDBC could be setup in an application that connects to single instance database in an Oracle restart such that JDBC connection failover to standby when a switchover or failover happens. The data guard setup used for this case is the same setup mentioned in the earlier post oracle Data Guard on 12.2 CDB with Oracle Restart.
1. By default ONS is disabled and offline in Oracle restart. In order to send FAN events ONS must be enabled and started in Oracle restart. This should be done in both primary and standby nodes.
srvctl enable ons
srvctl start ons
Once done check ONS status on both primary and standby.
crsctl stat res ora.ons
NAME=ora.ons
TYPE=ora.ons.type
TARGET=ONLINE
STATE=ONLINE on city7

crsctl stat res ora.ons
NAME=ora.ons
TYPE=ora.ons.type
TARGET=ONLINE
STATE=ONLINE on city7s
When ONS is enabled, stopping HAS throws up following error
crsctl stop has
...
CRS-2673: Attempting to stop 'ora.ons' on 'city7s'
CRS-5014: Agent "ORAAGENT" timed out starting process "/opt/app/oracle/product/12.2.0/grid/opmn/bin/onsctli" for action "stop": details at "(:CLSN00009:)" in "/opt/app/oracle/diag/crs/city7s/crs/trace/ohasd_oraagent_grid.trc"
CRS-2675: Stop of 'ora.ons' on 'city7s' failed
CRS-2679: Attempting to clean 'ora.ons' on 'city7s'
CRS-2681: Clean of 'ora.ons' on 'city7s' succeeded
This only happens during stopping of HAS and no such issue during start up of HAS and ONS service gets started along with other services. This appear to be a known issue in other version relating to RAC but nothing could be found on MOS with regard to 12.2 Oracle restart. SR was raised and this is being investigated under bug 28134413. In spite of this issue the failover works as expected.
Update: 2020-01-28 - As a result of the SR raised Oracle has created MOS doc 2631403.1 which now states this is expected behavior on SIHA.

2. Create a service and associate it with a PDB for application to connect. It's important that application connect to the database using the service for failover to work in the event of role transition. When a service is created for PDB, the PDB could be brought up by starting the service. However, stopping the service doesn't bring down the PDB but only the service is stopped. Following service was created on primary PDB.
srvctl add service -db prodcdb -pdb pdbapp1 -service devsrv -role PRIMARY -notification TRUE -failovertype NONE -failovermethod NONE -failoverdelay 0 -failoverretry 0 

srvctl config service -d prodcdb -s devsrv

Service name: devsrv
Cardinality: SINGLETON
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Global: false
Commit Outcome: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: pdbapp1
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Following service was created on standby PDB to be active when standby becomes primary. The service name must be same as the service created in the primary.
srvctl add service -db stbycdb -pdb pdbapp1 -service devsrv -role PRIMARY -notification TRUE -failovertype NONE -failovermethod NONE -failoverdelay 0 -failoverretry 0

srvctl config service -d stbycdb -s devsrv
Service name: devsrv
Cardinality: SINGLETON
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Global: false
Commit Outcome: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: pdbapp1
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Once service is created make sure patch for bug 26439462 (Doc ID 26439462.8) is applied. This bug prevents the bringing up of the PDB service automatically after role transition. Applying the latest RU for 12.2 (at the time of the testing it was 12.2.0.1.180417) resolved this issue. If the PDB service doesn't automatically starts then the JDBC failover will fail. This could be tested by carrying out a switchover and checking if the PDB service automatically comes up after role transition.

3. Configure the JDBC client to use UCP and enable FCF by setting ONS configuration settings. Details of this could be found on the 12c1 white paper and high availability best practice guide.



4. Create a TNS entry containing both primary and standby hosts and the service name created earlier. Use this TNS entry to connect to the database. To avoid ORA-12514 set (RETRY_COUNT x RETRY_DELAY) such that it is slightly higher than the total time for the switchover and to start the service.
DGTNS =
  (DESCRIPTION =
    (FAILOVER = on)(CONNECT_TIMEOUT=60)(RETRY_COUNT=40)(RETRY_DELAY=2)(TRANSPORT_CONNECT_TIMEOUT=1)
    (ADDRESS_LIST =
      (LOAD_BALANCE = yes)
      (ADDRESS = (PROTOCOL = TCP)(HOST = city7.domain.net)(PORT = 1581))
      (ADDRESS = (PROTOCOL = TCP)(HOST = city7s.domain.net)(PORT = 1581))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = devsrv)
    )
  )
5. Start the application and verify application server IP is listed in the ONS subscription list in each database server.

6. Do a switchover and check the application connectivity. For testing purpose a java application was created to output the connected database. The output below shows that initially it was connected to the PDB in the stbycdb CDB which was primary at that time. During the switchover time period while the standby DB is made primary and PDB and associated service is started connection could error. Once the service is up the JDBC connections succeeds.
Connected to stbycdb DB Server hpc1.domain.net Application Server on Mon May 21 09:55:50 BST 2018
Connected to stbycdb DB Server hpc1.domain.net Application Server on Mon May 21 09:55:51 BST 2018
Connected to stbycdb DB Server hpc1.domain.net Application Server on Mon May 21 09:55:52 BST 2018
Connected to prodcdb DB Server hpc1.domain.net Application Server on Mon May 21 09:56:10 BST 2018
Connected to prodcdb DB Server hpc1.domain.net Application Server on Mon May 21 09:56:11 BST 2018
Connected to prodcdb DB Server hpc1.domain.net Application Server on Mon May 21 09:56:13 BST 2018