Monday, November 30, 2020

High Disk file operations I/O Waits When Querying Unified Audit Trail

DB monitoring showed high waits for Disk file operations I/O event.
The select query originated from the EM console.
The database is running pure unified auditing (not mix mode) as such the dba_common_audit_trail part was of no concern. A simple select query on the unified audit trail also showed the same high wait times for Disk file operations I/O.
The issue was happening only on the CDB$root unified audit trail. The PDB audit trail had no issue.
Gathering statistics on audsys didn't help either.
exec dbms_STATS.GATHER_TABLE_STATS(OWNNAME=>'AUDSYS',TABNAME=>'AUD$UNIFIED',degree=>4);
Finally ran a 10046 trace and it showed the following for the Disk file operations I/O wait.
WAIT #140628544496888: nam='Disk file operations I/O' ela= 2352 FileOperation=8 fileno=1 filetype=36 obj#=-1 tim=5526698301528
It was starge objection id is -1 but stranger was there was no filetype 36 listed.
select file_no,filetype_id,FILETYPE_NAME from v$iostat_file where file_no=1 and filetype_id=36;

no rows selected 


At this stage looked at the historical data for the disk file operation I/O wait event on the AWR. It showed the first high waits occured soon after the swtichover to standby (circled in black). Even after switching back to original primary the wait events continued (circled in blue). Prior the switchover there was very low wait events and wait times on the primary.
When unified auditing is enabled on a standby it creates audit files on the local disk in the $ORACLE_BASE/audit/$ORACLE_SID directory. Based on above pattern a hypothesis was devised that unified audit trail could be getting populated from records in the audit files on the local disk that were craeted when it was a standby.
This hypothesis was tested out by archiving and removing the audit records created in the $ORACLE_BASE/audit/$ORACLE_SID. Afterwards querying the unified audit trail did not result in high disk file operation I/O.
Comparing the audit record count before the audit files on the local disks were removed
select count(*) from unified_audit_trail;

COUNT(*)
----------
761793 
and after
select count(*) from unified_audit_trail;

COUNT(*)
----------
516385
shows that some records from audit files on the local disk were used to populat the unified audit trail view.

Saturday, November 7, 2020

Recovery Catalog and Data Guard

Previous post listed steps for setting up a recovery catalog. This post looks at steps specific to using recovery catalog in a data guard environment.
Data guard doesn't chagne the way how a recovery catalog is created. However, if multple standby instances exists then the space usage would be higher compared to single instance. Also as multiple standby need to connect the bandwidth requirments would also increase. It was noticed that standby in geographically distant location took longer to sync than instances close by to the recovery catalog. The main reason for connecting standbys to recovery catalog is so there's consistent view of backups of the whole configuration. This is useful if backups are interchanged between primary and standby. If the data guard configuration also consists of standby instances that are part of a reader farm and not seutp to be a primary in the future then those could be omitted from the recovery catalog.

    Registering Primary Database
    Registering Standby Database
    Far Sync and Recovery Catalog
    Unregistering a Database From Recovery Catalog
    Role Changes
    DB Structure Changes
    Other issues

Registering Primary Database
In a data guard setup only the primary database is explicitly registered in the recovery catalog. Same as in previous post connect to the target and the catalog and run register database command.
RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog

full resync complete

RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 3423639639       PRIMARY          PPCBXDB1
Registering Standby Database
To register a standby conenct to the standby as target DB and to recovery catalog. The documentation says connection alone is enough to register the standby in the recovery catalog.
rman target / catalog catalog_user@catalog_tns

Recovery Manager: Release 19.0.0.0.0 - Production on Thu Sep 3 11:06:51 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PPCBXDB1 (DBID=1223456789, not open)
recovery catalog database Password:
connected to recovery catalog database
However, this alone wasn't enough to register the standby. While connected run any comamnd such as show all and this would start an implicit resync with recovery catalog. At the end of this command the standby would be registered in the recvoery catalog.
show all;

RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2
While the standby resync continues it holds an explicit lock on the DB table. If multple standbys are registering at the same time this would appaer as row lock contention.
Therefore it's best to add one standby at a time. While this lock remains existing databases won't be able to complete their backups while connected to the catalog.

Far Sync and Recovery Catalog
Archive logs could be backed up on far sync instances as they are shipped to final destination. Far sync instances could be added to the recovery catalog same way standby instances are added. However, adding far sync instances to recovery catalog causes an issue. The recovery catalog list them as primary and doesn't recognize the far sync instance.In this case PPCBXFS1 is the far sync instance
 RMAN> list db_unique_name all;

starting full resync of recovery catalog
full resync complete

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXFS1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2
After an SR oracle support confirmed there's no need to add far sync to the recovery catalog. As such do not add far sync to recovery catalog and any backups taken on far sync should use the control file for record keeping.

Unregistering a Database From Recovery Catalog
To unregister the priamry database use the same step as before. To unregister a standby database use db_unique_name to explicitly specify the standby instance.
RMAN>   list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB3
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB6
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB4
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2

RMAN> unregister Db_unique_name PPCBXDB6;

database db_unique_name is "PPCBXDB6", db_name is "PPCBXDB1" and DBID is 1223456789

Want to unregister the database with target db_unique_name (enter YES or NO)? YES
database with db_unique_name PPCBXDB6 unregistered from the recovery catalog

RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB1
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB4
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB2
1       PPCBXDB1 1223456789       STANDBY          PPCBXDB3
An intresting situation occurs when recovery catalog only has two primary databases with same name but different unique names. For example this could happen when all the standbys are removed from the recovery catalog and a former standby becomes the new primary. Then a new primary database is added with the same db name as the original primary. This second primary will have a different DB ID. Following shows this situation.
RMAN> list db_unique_name all;


List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 1223456789       PRIMARY          PPCBXDB2
1039212 PPCBXDB1 5223456789       PRIMARY          PPCBXDB1
PPCBXDB2 is a former standby that is now primary with DB id 1223456789. It has the db name of PPCBXDB1. Then another db of same name PPCBXDB1 is created and has the DB ID 5223456789. In this case unregistering the PPCBXDB2 refering the db unique name will fail.
RMAN> unregister db_unique_name PPCBXDB2;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06740: database name is not specified
To unregister use the database name. Then a database name and DB ID combination is prompted to identify the correct database.
RMAN> unregister database PPCBXDB1;

database name is "PPCBXDB1" and DBID is 1223456789

Do you really want to unregister the database (enter YES or NO)? YES
database unregistered from the recovery catalog
As 1223456789 is the DB ID of PPCBXDB2 it will be removed from recovery catalog leaving only the PPCBXDB1
RMAN> list db_unique_name all;


List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1039212 PPCBXDB1 5223456789       PRIMARY          PPCBXDB1

Role Changes
After a role change when the new primary connect to the recovery catalog an implicit resync update the new database role information.Before the role change db1 is primary
RMAN> list db_unique_name all;

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       PPCBXDB1 3423639639       PRIMARY          PPCBXDB1
1       PPCBXDB1 3423639639       STANDBY          PPCBXDB2
After a switchover db2 become primary.
RMAN> list db_unique_name all;

starting full resync of recovery catalog
full resync complete

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
80555   PPCBXDB1 3423639639       PRIMARY          PPCBXDB2
80555   PPCBXDB1 3423639639       STANDBY          PPCBXDB1


DB Structure Changes
When a DB structure change (such as adding a data file) happens on the primary the implicit resync on the standby (which happens just before a backup is taken on the standby) fails with the following error.
Starting backup at 03-september-2020 12:15:16 pm
current log archived at primary database
ORA-20079: full resync from primary database is not done

starting resync from primary
resyncing from database with DB_UNIQUE_NAME pptbxdb1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 09/03/2020 12:15:28
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 09/03/2020 12:15:28
RMAN-06613: Connect identifier for DB_UNIQUE_NAME ppcbxdb2 not configured
There are two ways to resovle this issue. As per the error could setup connection identifiers on all the databases to each other (considering future role changes. refer 1604302.1). This would be of the following format
CONFIGURE DB_UNIQUE_NAME 'db unique name of the db'   CONNECT IDENTIFIER  'tns to the db';
Second method is for the primary to resync itself with the recovery catalog after the DB structure change and before the standby attempts a backup. If primary is already in sync with recovery catalog after the structure change any subsequent backups on standby would not result in the above error.

Other Issues
A scheduled backup on a standby failed with the following error.
RMAN-08243: starting resync of recovery catalog
RMAN-08192: uncatalog BACKUPPIECE /ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1 in NOCATALOG mode
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 09/18/2020 14:41:14
RMAN-20110: set_stamp set_count conflict
This is a known issue and mentioned on the 2453569.1. Solution is to uncatalog the backuppiece and resync and catalog again. However, this only worked partially. Subsquent recataloging of the backuppiece failed with the same error.
RMAN> change backuppiece '/ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1' uncatalog;

uncataloged backup piece
backup piece handle=/ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1 RECID=593 STAMP=1051402128
Uncataloged 1 objects

next resync was done and completed successfully. However, subsequent catalog of the uncatalog backuppiece above fails with the same error.

RMAN> resync catalog;
RMAN-08243: starting resync of recovery catalog
RMAN-08245: resync complete

RMAN> catalog device type DISK backuppiece '/ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1';

RMAN-08192: uncatalog BACKUPPIECE /ora/backup/2020-09-18/full_bkp_PPCBXDB1_20200918_k8vam7sg_1_1 in NOCATALOG mode
RMAN-03002: failure of catalog command at 09/18/2020 14:27:39
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 09/18/2020 14:27:39
RMAN-20110: set_stamp set_count conflict 
After a SR was raised Oracle support instructed to uncatalog the backup piece from both standby and primary. Earlier it was uncataloged only on the standby instance where the issue happened. However, this solution could not be verified. By the time Oracle support came back with the solution the backup retention period has passed and backup piece in question was obsolete and deleted. After the problematic backup piece was deleted the backup resync with catalog continued without any issue on the standby instance that had this issue. During the time period where the issue remained the primary and other standby instances cotninue to resync with recovery catalog without any issue.

Useful metalink notes
RMAN Compatibility Matrix [ID 73431.1]
RMAN Resync of Recovery Catalog Fails with - RMAN-03014 and RMAN-20110: Set_stamp Set_Count Conflict [ID 2453569.1]
After New Files are Added to primary, RMAN Fails at Standby with ORA-20079: Full Resync From Primary Database Is Not Done [ID 1604302.1]

Related Posts
Recovery Catalog

Friday, November 6, 2020

Recovery Catalog

Recovery catalog allows to store database metadata in a dedicated database schema. It can have longer renteion for backup records than storing them in control file. If data guard is used a recovery catalog is needed to easily interchange backups between standby and primary. This post gives few highlights of using recovery catalog.

    Creating Recovery Catalog
    Registering Database in Recovery Catalog
    Sync Changes to Catalog
    Unregistering a Database From Recovery Catalog
    Dropping the Recovery Catalog

Creating Recovery Catalog
To be useful recovery catalog must reside external to the database registered within. Usually on a dedicated DB server and Oracle provides restricted special usage license to be used with databases used purely for recovery catalog.
It's adivsed to use a dedicated tablespace for recovery catalog schema user. In this post following tablespace is used.
SQL> create tablespace recocattbs datafile size 50m autoextend on next 50m maxsize unlimited;

Tablespace created.
The recovery catalog schema user could be created in a pdb or non-cdb.
create user recocat identified by CAT_#123reco default tablespace recocattbs temporary tablespace temp quota unlimited on recocattbs;
The recovery catalog user need following role to create the catalog.
GRANT RECOVERY_CATALOG_OWNER TO recocat;
However, this role alone isn't enough to prevent the following error.
RMAN> create catalog;

error creating dbms_rcvcat package body
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06433: error installing recovery catalog
To resolve this add the following (refer 1086561.1)
grant SELECT ON USER_DB_LINKS to recocat;
If as part of DB security hardening (CIS standard) if certain execution rights were revoked from public then grant those privileges to recovery catalog user. Below is the required list for recovery catalog.
grant execute on utl_http to recocat;
grant execute on dbms_lob to recocat;
grant execute on dbms_xmlgen to recocat;
grant execute on dbms_sql to recocat;
With the role and privileges granted connect to the recovery catalog using rman and create the catalog.
rman catalog recocat/CAT_#123reco

RMAN> create catalog;

recovery catalog created
Registering Database in Recovery Catalog
To register a database in recovery catalog connect to the target database and the recovery catalog at the same time.
rman target / catalog recocat/CAT_#123reco@devpdb

Recovery Manager: Release 19.0.0.0.0 - Production on Mon Aug 24 17:30:37 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DEVCDB (DBID=2653754887)
connected to recovery catalog database
. Once connected run register database command. First time this is run an implicit resync happens. Depending on amount of records kept in controlfile, netwrok bandwidth between target DB and recovery catalog DB server this could take some time.
RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog
full resync complete
Once registered with recovery catalog all the database registered could be viewed with following.
RMAN> list db_unique_name all;


List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       DEVCDB   2653754887       PRIMARY          DEVCDB
608     XGENLT   4291278017       PRIMARY          XGENLT
Sync Changes to Catalog
Whenever changes are made to DB while disconnected from recovery catalog, those changes are sync to recovery catalog with an implicit sync job. This happpens when next time target DB is connected to the recovery catalog and a command is run (not necessarily a sync comamnd). For exmaple the retention is changed while disconnected from recovery catalog.
rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Mon Aug 24 17:32:49 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DEVCDB (DBID=2653754887)


RMAN>  CONFIGURE RETENTION POLICY TO recovery window of  1 days;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 1 DAYS;
new RMAN configuration parameters are successfully stored
When next time connected to recvoery catalog an implicit resync happens when a command (this case show command) is run.
rman target / catalog recocat/CAT_#123reco@devpdb

Recovery Manager: Release 19.0.0.0.0 - Production on Mon Aug 24 17:35:43 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DEVCDB (DBID=2653754887)
connected to recovery catalog database

RMAN>  show all for db_unique_name devcdb;

starting full resync of recovery catalog
full resync complete
RMAN configuration parameters for database with db_unique_name DEVCDB are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 1 DAYS;
If an explicity resync is needed then run the following.
RMAN> resync catalog;
starting resync of recovery catalog
resync complete




Unregistering a Database From Recovery Catalog
To unregister a database from recovery catalog invoke unregister command with the db_name or db_unique_name (in the case of standby instances)
RMAN> unregister database xgenlt;

database name is "XGENLT" and DBID is 4291278017

Do you really want to unregister the database (enter YES or NO)? yes
database unregistered from the recovery catalog

RMAN> list db_unique_name all;


List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
1       DEVCDB   2653754887       PRIMARY          DEVCDB
Oracle documentation mentions an option to unregister with backups. But invoking this results in an error.
RMAN> unregister db_unique_name PPCBXDB5  including backups;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of unregister db & bck command at 09/03/2020 19:15:34
RMAN-06014: command not implemented yet: unregister db & bck
This is mentioned as bug 5308590.

Dropping the Recovery Catalog
To drop the catalog connect to target and recovery catalog as recovery catalog owner and run the drop catalog command twice.
 rman target / catalog recocat/CAT_#123reco@devpdb

RMAN> drop catalog;

recovery catalog owner is RECOCAT
enter DROP CATALOG command again to confirm catalog removal

RMAN>  drop catalog ;

recovery catalog dropped