Saturday, September 8, 2018

DG Broker Continue With Status "ROLLING DATABASE MAINTENANCE IN PROGRESS" Even After ROLLBACK_PLAN is Called

With Oracle 12c a new PL/SQL package called DBMS_ROLLING was provided which greatly automate the rolling upgrade of databases with minimum down time and maximum protection. Refer MOS 2086512.1 and the white paper from MAA. Using of DBMS_ROLLING requires active data guard license option. DBMS_ROLLING could only be used on versions 12.1.0.2 or above. For other versions use transient logical standby method for upgrade.
The database version used for this post is 12.2.0.1. Prior releases required disabling DG broker when dbms_rolling was used. But there's improvements in 12.2 version according to Oracle documentaion. As of Oracle Database 12c Release 2 (12.2.0.1), Data Guard broker can remain on during a DBMS_ROLLING rolling upgrade; there is no longer any need to disable it.
Broker support is enabled by default during execution of the DBMS_ROLLING.BUILD_PLAN procedure if the broker is enabled at the time of the call. When broker support is enabled, the broker sets up the redo transport destinations as necessary from the original primary database as well as from the rolling upgrade target, manages instance count on the upgrade target if it is an Oracle RAC database, and notifies Oracle Clusterware and Global Data Services as appropriate during the course of the rolling upgrade.
Broker support can be manually controlled using the DBMS_ROLLING parameter, DG BROKER. Although role transitions are typically performed using the broker, the switchover step in a rolling upgrade should continue to be performed using the DBMS_ROLLING.SWITCHOVER procedure.

Further more the broker admin doc states "When an upgrade is performed using the PL/SQL package, DBMS_ROLLING, the DG_BROKER_CONFIG_FILEn parameters must specify a location outside of Oracle home. (The default location is the dbs directory in Oracle home.)"
The current data guard configuration consists of two databases and DG broker status is shown below.
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database
    london  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 44 seconds ago)
The supplement log status is as below.
select SUPPLEMENTAL_LOG_DATA_ALL,SUPPLEMENTAL_LOG_DATA_MIN,
SUPPLEMENTAL_LOG_DATA_PK,SUPPLEMENTAL_LOG_DATA_UI,
SUPPLEMENTAL_LOG_DATA_FK,SUPPLEMENTAL_LOG_DATA_PL 
from v$database;

SUP SUPPLEME SUP SUP SUP SUP
--- -------- --- --- --- ---
NO  NO       NO  NO  NO  NO
In previous version after the dbms_rolling upgrade supplement log setting had to be manually rolled back. But with 12.2 this was not necessary as seen later in the post.
This post shows steps up to starting the upgrade plan and then rolling back. In the critical path after the start plan one would move onto switchover. Rollback cannot be called if switchover has happened (Refer page 5 on white paper mentioned above).
1. To begin the upgrade using dbms_rolling put the standby in mount mode and run init_plan. As there's only two databases in the configuration, the only standby is made to be the future primary.
SQL> exec DBMS_ROLLING.INIT_PLAN(future_primary=>'london');
PL/SQL procedure successfully completed.
Check the plan parameter status with
select scope, name, curval from dba_rolling_parameters order by scope, name;
2. Next execute the build plan and check the plan values.
SQL> exec DBMS_ROLLING.BUILD_PLAN;
PL/SQL procedure successfully completed.

SELECT instid, target, phase, description FROM DBA_ROLLING_PLAN;
3. Run the start plan to begin the conversion of the physical standby to a transient logical standby.
SQL> exec DBMS_ROLLING.START_PLAN;
PL/SQL procedure successfully completed.
This step does following things.

i. Backing up the control file for each database to a trace file. These could be verified on the alert logs of primary and standby. On primary alert log
RTS(17286): executing rolling upgrade instruction 26 from plan revision 1
alter database backup controlfile to 'rolling_change_backup.f' reuse
Completed: alter database backup controlfile to 'rolling_change_backup.f' reuse
On standby alert log
RTS(2699): executing rolling upgrade instruction 28 from plan revision 1
alter database backup controlfile to 'rolling_change_backup.f' reuse
Completed: alter database backup controlfile to 'rolling_change_backup.f' reuse
ii. Creating flashback database guaranteed restore points both on primary and standby. Alert logs entries similar to below could be seen on both databases.
2018-08-16T06:32:40.016320-04:00
RTS(17286): executing rolling upgrade instruction 22 from plan revision 1
Created guaranteed restore point DBMSRU_INITIAL
iii. Building a LogMiner dictionary at the primary database. Following entries could be seen on primary alert log
Thu Aug 16 06:32:49 2018
Logminer Bld: Done
LOGMINER: PLSQL Quiesce mode enabled
LOGMINER: Dictionary Build: Waiting for completion of transactions started at or before scn 11927360 (0x0000000000b5ff40)
LOGMINER: Dictionary Build: All transactions started at or before scn 11927360 (0x0000000000b5ff40) have completed
2018-08-16T06:32:49.308962-04:00
LOGSTDBY: Dictionary Build PLSQL Quiesce - SID: 2268, wait_init: 300 sec, wait_timeout: 3600 sec.
iv. Recovering the designated physical standby into a transient logical standby database. Following could be seen on standby alert log.
RECOVER TO LOGICAL STANDBY: Complete - Database mounted as logical standby
Completed: alter database recover to logical standby keep identity
At this stage the DG broker will show standby as a transient logical standby database and give a warning status as rolling database maintenance is active. Importantly (reason for this post) the DG broker status says "ROLLING DATABASE MAINTENANCE IN PROGRESS".
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database
    london  - Transient logical standby database

Fast-Start Failover: DISABLED

Configuration Status:
    ROLLING DATABASE MAINTENANCE IN PROGRESS


DGMGRL> show database london

Database - london

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          2 seconds (computed 0 seconds ago)
  Average Apply Rate: (unknown)
  Real Time Query:    OFF
  Instance(s):
    london

  Database Warning(s):
    ORA-16866: database converted to transient logical standby database for rolling database maintenance

Database Status:
WARNING
v. Loading the LogMiner dictionary into the logical standby database. Following entries could be seen on standby alert log.
LOGMINER: Begin mining logfile during dictionary load for session 1 thread 1 sequence 267, +FRA/LONDON/foreign_archivelog/colombo/2018_08_16/thread_1_seq_267.365.984292375
LOGMINER: Preparing to load 1 dictionaries for session 1
LOGMINER: Finalizing dictionary load for session 1
LOGMINER: Gathering statistics on logminer dictionary. (incremental, nonparallel)
LOGMINER: Completed dictionary load for session 1
LOGMINER: End mining logfiles during dictionary load for session 1
vi. Configuring LGS databases with the transient logical standby database. As there was no other standbys in the configuration, no leading group standbys (LGS) to configure.



4. Ideally from this point one would upgrade the standby to new DB version and do a switchover and upgrade the trailing group master. But if for some reason rolling back is needed it has to happen before the switchover is done. This post is based on an issue encountered after the rollback. In order to rollback the steps done by dbms_rolling put the standby DB in mount mode and execute the rollback_plan.
SQL> select database_role, open_mode from v$database;

DATABASE_ROLE      OPEN_MODE
----------------   -----------
LOGICAL STANDBY    READ WRITE

SQL> startup force mount;

SQL> select open_mode from v$database;

OPEN_MODE
-----------
MOUNTED

SQL> exec DBMS_ROLLING.ROLLBACK_PLAN;
PL/SQL procedure successfully completed.
5. Rollback command will convert the transient logical standby to a physical standby and also rollback supplemental logging settings. If rollback failed then supplemental logging data setting must be manually cleared.
SQL> select database_role,open_mode from v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------
PHYSICAL STANDBY MOUNTED

select SUPPLEMENTAL_LOG_DATA_ALL,SUPPLEMENTAL_LOG_DATA_MIN,
SUPPLEMENTAL_LOG_DATA_PK,SUPPLEMENTAL_LOG_DATA_UI,
SUPPLEMENTAL_LOG_DATA_FK,SUPPLEMENTAL_LOG_DATA_PL 
from v$database;

SUP SUPPLEME SUP SUP SUP SUP
--- -------- --- --- --- ---
NO  NO       NO  NO  NO  NO
6. End the upgrade plan with a call to destroy plan.
SQL> exec DBMS_ROLLING.DESTROY_PLAN;
PL/SQL procedure successfully completed.
7. Though the databases are back to the state they were before the upgrade began, the data guard broker continue to say "ROLLING DATABASE MAINTENANCE IN PROGRESS".
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database

  Members Not Receiving Redo:
  london  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
    ROLLING DATABASE MAINTENANCE IN PROGRESS
Even though standby is listed as not receiving redo, this is not actually the case. Apply on status is shown on DG broker output and redo apply could be verified by looking at the standby alert log.
DGMGRL> show database london

Database - london

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 10.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    london

Database Status:
SUCCESS
Validate database also does not give any errors.
DGMGRL> validate database london

  Database Role:     Physical standby database
  Primary Database:  colombo

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Managed by Clusterware:
    colombo:  YES
    london :  YES
The reason for DG broker to output an erroneous message is the notifying of the dbms_rolling ending is included only in the finish phase.
SELECT instid, target, phase, description FROM DBA_ROLLING_PLAN;
...
83 colombo    FINISH         Notify Data Guard broker that DBMS_ROLLING has finished
84 london     FINISH         Notify Data Guard broker that DBMS_ROLLING has finished
Rollback and destroy phases has no provisions to notify the DG Broker that dbms_rolling has finished. Due to this DG broker continue to output an incorrect status. As per MOS 2347179.1 this has been identified as a bug (26921039) which is fixed in 18c but no backport available for 12c.



Even though DG broker gives a incorrect status, this doesn't affect the ability to switchover.
Connected to "colombo"
Connected as SYSDG.
DGMGRL> switchover to london
Performing switchover NOW, please wait...
Operation requires a connection to database "london"
Connecting ...
Connected to "london"
Connected as SYSDBA.
New primary database "london" is opening...
Oracle Clusterware is restarting database "colombo" ...
Switchover succeeded, new primary is "london"
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  london  - Primary database

  Members Not Receiving Redo:
  colombo - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
    ROLLING DATABASE MAINTENANCE IN PROGRESS


DGMGRL> validate database colombo

  Database Role:     Physical standby database
  Primary Database:  london

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Managed by Clusterware:
    london :  YES
    colombo:  YES

  Log Files Cleared:
    london Standby Redo Log Files:   Cleared
    colombo Online Redo Log Files:   Not Cleared
    colombo Standby Redo Log Files:  Available


DGMGRL> switchover to colombo
Performing switchover NOW, please wait...
Operation requires a connection to database "colombo"
Connecting ...
Connected to "colombo"
Connected as SYSDBA.
New primary database "colombo" is opening...
Oracle Clusterware is restarting database "london" ...
Switchover succeeded, new primary is "colombo"
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database
    london  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)
Only workaround mentioned in 2347179.1 to clear this incorrect DG broker status message is to restart the primary which will incur an unnecessary downtime. Testing showed that restarting the primary does clear the incorrect status message. As seen above outputs, even a switchover also does the same (as it results in DB being restarted).
Alternatively recreating the DG broker also cleared the incorrect DG broker status. This seems to be a better option as it doesn't require a down time.