Saturday, November 7, 2020

Recovery Catalog and Data Guard

Previous post listed steps for setting up a recovery catalog. This post looks at steps specific to using recovery catalog in a data guard environment.
Data guard doesn't chagne the way how a recovery catalog is created. However, if multple standby instances exists then the space usage would be higher compared to single instance. Also as multiple standby need to connect the bandwidth requirments would also increase. It was noticed that standby in geographically distant location took longer to sync than instances close by to the recovery catalog. The main reason for connecting standbys to recovery catalog is so there's consistent view of backups of the whole configuration. This is useful if backups are interchanged between primary and standby. If the data guard configuration also consists of standby instances that are part of a reader farm and not seutp to be a primary in the future then those could be omitted from the recovery catalog.

    Registering Primary Database
    Registering Standby Database
    Far Sync and Recovery Catalog
    Unregistering a Database From Recovery Catalog
    Role Changes
    DB Structure Changes
    Other issues

Registering Primary Database
In a data guard setup only the primary database is explicitly registered in the recovery catalog. Same as in previous post connect to the target and the catalog and run register database command.
RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog

full resync complete

RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 3423639639       PRIMARY          PPCBXDB1
Registering Standby Database
To register a standby conenct to the standby as target DB and to recovery catalog. The documentation says connection alone is enough to register the standby in the recovery catalog.
rman target / catalog catalog_user@catalog_tns

Recovery Manager: Release 19.0.0.0.0 - Production on Thu Sep 3 11:06:51 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PPCBXDB1 (DBID=1223456789, not open)
recovery catalog database Password:
connected to recovery catalog database
However, this alone wasn't enough to register the standby. While connected run any comamnd such as show all and this would start an implicit resync with recovery catalog. At the end of this command the standby would be registered in the recvoery catalog.
show all;

RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2
While the standby resync continues it holds an explicit lock on the DB table. If multple standbys are registering at the same time this would appaer as row lock contention.
Therefore it's best to add one standby at a time. While this lock remains existing databases won't be able to complete their backups while connected to the catalog.

Far Sync and Recovery Catalog
Archive logs could be backed up on far sync instances as they are shipped to final destination. Far sync instances could be added to the recovery catalog same way standby instances are added. However, adding far sync instances to recovery catalog causes an issue. The recovery catalog list them as primary and doesn't recognize the far sync instance.In this case PPCBXFS1 is the far sync instance
 RMAN> list db_unique_name all;

starting full resync of recovery catalog
full resync complete

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXFS1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2
After an SR oracle support confirmed there's no need to add far sync to the recovery catalog. As such do not add far sync to recovery catalog and any backups taken on far sync should use the control file for record keeping.

Unregistering a Database From Recovery Catalog
To unregister the priamry database use the same step as before. To unregister a standby database use db_unique_name to explicitly specify the standby instance.
RMAN>   list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB3
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB6
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB4
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2

RMAN> unregister Db_unique_name PPCBXDB6;

database db_unique_name is "PPCBXDB6", db_name is "PPCBXDB1" and DBID is 1223456789

Want to unregister the database with target db_unique_name (enter YES or NO)? YES
database with db_unique_name PPCBXDB6 unregistered from the recovery catalog

RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB4
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB3
An intresting situation occurs when recovery catalog only has two primary databases with same name but different unique names. For example this could happen when all the standbys are removed from the recovery catalog and a former standby becomes the new primary. Then a new primary database is added with the same db name as the original primary. This second primary will have a different DB ID. Following shows this situation.
RMAN> list db_unique_name all;


List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB2
1039212 PPCBXDB1 5223456789       PRIMARY          PPCBXDB1
PPCBXDB2 is a former standby that is now primary with DB id 1223456789. It has the db name of PPCBXDB1. Then another db of same name PPCBXDB1 is created and has the DB ID 5223456789. In this case unregistering the PPCBXDB2 refering the db unique name will fail.
RMAN> unregister db_unique_name PPCBXDB2;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06740: database name is not specified
To unregister use the database name. Then a database name and DB ID combination is prompted to identify the correct database.
RMAN> unregister database PPCBXDB1;

database name is "PPCBXDB1" and DBID is 1223456789

Do you really want to unregister the database (enter YES or NO)? YES
database unregistered from the recovery catalog
As 1223456789 is the DB ID of PPCBXDB2 it will be removed from recovery catalog leaving only the PPCBXDB1
RMAN> list db_unique_name all;


List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1039212 PPCBXDB1 5223456789       PRIMARY          PPCBXDB1

Role Changes
After a role change when the new primary connect to the recovery catalog an implicit resync update the new database role information.Before the role change db1 is primary
RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 3423639639       PRIMARY          PPCBXDB1
1       PPCBXDB1 3423639639       STANDBY          PPCBXDB2
After a switchover db2 become primary.
RMAN> list db_unique_name all;

starting full resync of recovery catalog
full resync complete

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
80555   PPCBXDB1 3423639639       PRIMARY          PPCBXDB2
80555   PPCBXDB1 3423639639       STANDBY          PPCBXDB1


DB Structure Changes
When a DB structure change (such as adding a data file) happens on the primary the implicit resync on the standby (which happens just before a backup is taken on the standby) fails with the following error.
Starting backup at 03-september-2020 12:15:16 pm
current log archived at primary database
ORA-20079: full resync from primary database is not done

starting resync from primary
resyncing from database with DB_UNIQUE_NAME pptbxdb1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 09/03/2020 12:15:28
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 09/03/2020 12:15:28
RMAN-06613: Connect identifier for DB_UNIQUE_NAME ppcbxdb2 not configured
There are two ways to resovle this issue. As per the error could setup connection identifiers on all the databases to each other (considering future role changes. refer 1604302.1). This would be of the following format
CONFIGURE DB_UNIQUE_NAME 'db unique name of the db'   CONNECT IDENTIFIER  'tns to the db';
Second method is for the primary to resync itself with the recovery catalog after the DB structure change and before the standby attempts a backup. If primary is already in sync with recovery catalog after the structure change any subsequent backups on standby would not result in the above error.

Other Issues
A scheduled backup on a standby failed with the following error.
RMAN-08243: starting resync of recovery catalog
RMAN-08192: uncatalog BACKUPPIECE /ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1 in NOCATALOG mode
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 09/18/2020 14:41:14
RMAN-20110: set_stamp set_count conflict
This is a known issue and mentioned on the 2453569.1. Solution is to uncatalog the backuppiece and resync and catalog again. However, this only worked partially. Subsquent recataloging of the backuppiece failed with the same error.
RMAN> change backuppiece '/ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1' uncatalog;

uncataloged backup piece
backup piece handle=/ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1 RECID=593 STAMP=1051402128
Uncataloged 1 objects

next resync was done and completed successfully. However, subsequent catalog of the uncatalog backuppiece above fails with the same error.

RMAN> resync catalog;
RMAN-08243: starting resync of recovery catalog
RMAN-08245: resync complete

RMAN> catalog device type DISK backuppiece '/ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1';

RMAN-08192: uncatalog BACKUPPIECE /ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1 in NOCATALOG mode
RMAN-03002: failure of catalog command at 09/18/2020 14:27:39
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 09/18/2020 14:27:39
RMAN-20110: set_stamp set_count conflict 
After a SR was raised Oracle support instructed to uncatalog the backup piece from both standby and primary. Earlier it was uncataloged only on the standby instance where the issue happened. However, this solution could not be verified. By the time Oracle support came back with the solution the backup retention period has passed and backup piece in question was obsolete and deleted. After the problematic backup piece was deleted the backup resync with catalog continued without any issue on the standby instance that had this issue. During the time period where the issue remained the primary and other standby instances cotninue to resync with recovery catalog without any issue.

Useful metalink notes
RMAN Compatibility Matrix [ID 73431.1]
RMAN Resync of Recovery Catalog Fails with - RMAN-03014 and RMAN-20110: Set_stamp Set_Count Conflict [ID 2453569.1]
After New Files are Added to primary, RMAN Fails at Standby with ORA-20079: Full Resync From Primary Database Is Not Done [ID 1604302.1]

Related Posts
Recovery Catalog