Saturday, August 8, 2020

Removing a Failed Standby Database From a Data Guard Configuration

A previous post explained steps for removing a standby instance from a data guard configuration. This post explains steps for the same but when the standby being removed has failed and cannot be reached (or connect into).
In a standby configuration with multiple standby databases once instance is unreachable due to hardware failure. The issue is irrecoverable and only option is to rebuild the node and the standby instance. In mean time the existing standby configuration will give an error state due to the unavailability of the failed instance.
DGMGRL> show configuration

Configuration - fc_pp_dg

  Protection Mode: MaxAvailability
  Members:
  ppdb1  - Primary database
    ppdb2  - Physical standby database
    ppdb3  - Physical standby database
      ppdb4  - Physical standby database (receiving current redo)
    ppfs1  - Far sync instance
      ppdb5  - Physical standby database
      ppdb6  - Physical standby database
      ppdb9  - Physical standby database
      ppdb10 - Physical standby database

  Members Not Receiving Redo:
  ppfs2  - Far sync instance (alternate of ppfs1)
  ppdb8  - Physical standby database
    Error: ORA-12170: TNS:Connect timeout occurred

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 96 seconds ago)
As the first step remove any references to the failed instance on RedoRoutes.

Then issue the remove command which will succeed with a warning.
DGMGRL> remove database ppdb8;
Warning: ORA-16620: one or more members could not be reached for a remove operation

Removed database "ppdb8" from the configuration
The warning is due to broker being unable to connect to the failed instance to execute the clean up commands. The dataguard broke log shows this.
2020-07-29T12:33:36.403+00:00
Failed to connect to remote database ppdb8. Error is ORA-12170
Metadata Resync failed. Status = ORA-12170
2020-07-29T12:33:48.691+00:00
Failed to connect to remote database ppdb8. Error is ORA-12170
Failed to send message to member ppdb8. Error code is ORA-12170.
Data Guard Broker Status Summary:
  Type                        Name                             Severity  Status
  Configuration               fc_pp_dg                       Warning  ORA-16607: one or more members have failed
  Primary Database            ppdb1                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb2                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb3                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb4                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb5                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb6                          Success  ORA-0: normal, successful completion
  Far Sync Instance           ppfs1                          Success  ORA-0: normal, successful completion
  Far Sync Instance           ppfs2                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb8                            Error  ORA-12170: TNS:Connect timeout occurred
  Physical Standby Database   ppdb9                          Success  ORA-0: normal, successful completion
  Physical Standby Database   ppdb10                         Success  ORA-0: normal, successful completion
2020-07-29T12:34:00.979+00:00
Failed to connect to remote database ppdb8. Error is ORA-12170
Failed to send message to member ppdb8. Error code is ORA-12170.
2020-07-29T12:34:05.646+00:00
REMOVE DATABASE ppdb8
2020-07-29T12:34:17.939+00:00
Failed to connect to remote database ppdb8. Error is ORA-12170
Failed to send message to member ppdb8. Error code is ORA-12170.
Database ppdb8 (0x0a001000) could not be contacted for database removal, status = ORA-12170
2020-07-29T12:34:31.571+00:00
Failed to connect to remote database ppdb8. Error is ORA-12170
Failed to send message to member ppdb8. Error code is ORA-12170.
2020-07-29T12:34:33.297+00:00
Database ppdb8 removal completed with warning ORA-16620
REMOVE DATABASE  completed with warning ORA-16620
However, all the other databases that are part of the dataguard configuration would have had their log_archive_config parameter updated by removing any reference to the failed database.
NAME                           VALUE
------------------------------ -----------------------------------------
log_archive_config             dg_config=(ppdb1,ppdb2,ppdb3,ppdb4,ppfs1,
                               ppdb5,ppdb6,ppdb9,ppdb10,ppfs2)


Once the failed instance is removed the dataguard broke shows status success.
DGMGRL>  show configuration

Configuration - fc_pp_dg

  Protection Mode: MaxAvailability
  Members:
  ppdb1  - Primary database
    ppdb2  - Physical standby database
    ppdb3  - Physical standby database
      ppdb4  - Physical standby database (receiving current redo)
    ppfs1  - Far sync instance
      ppdb5  - Physical standby database
      ppdb6  - Physical standby database
      ppdb9  - Physical standby database
      ppdb10 - Physical standby database

  Members Not Receiving Redo:
  ppfs2  - Far sync instance (alternate of ppfs1)

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 55 seconds ago)
Related Posts
Removing a Standby Database From a Data Guard Configuration
Adding a New Physical Standby to Exiting Data Guard Setup

Saturday, August 1, 2020

Installing Using Gold Image Results in [FATAL] [INS-35952] or [FATAL] [INS-42505]

Creating a gold image is a good way to have a standardized starting point for an installation, either oracle home or grid infrastructure. The gold image creation process itself eliminate lot of unnecessary files from the final image and if needed some could be explicitly excluded (i.e. patch_storage). However, it is still possible that the image would contain some host specific files such as s_crsconfig_<host_name>_env.txt in oracle restart or password file in a oracle home.
It is possible to clean up these files before running a new installation from the gold image. However, this would result in error as shown below (on Oracle restart)
./gridSetup.sh -createGoldImage -destinationLocation /opt/installs -silent -exclFiles $ORACLE_HOME/.patch_storage
Launching Oracle Grid Infrastructure Setup Wizard...

[FATAL] [INS-42505] The installer has detected that the Oracle Grid Infrastructure home software at (/opt/app/oracle/product/19.x.0/grid) is not complete.
   CAUSE: Following files are missing:
[/opt/app/oracle/product/19.x.0/grid/crs/install/s_crsconfig_sandpit-oracle-db_env.txt, /opt/app/oracle/product/19.x.0/grid/crs/utl/sandpit-oracle-db, /opt/app/oracle/product/19.x.0/grid/crs/utl/sandpit-oracle-db/crsconfig_dirs, /opt/app/oracle/product/19.x.0/grid/crs/utl/sandpit-oracle-db/crsconfig_fileperms, /opt/app/oracle/product/19.x.0/grid/evm/log/sandpit-oracle-db_evmProcessEventSource, /opt/app/oracle/product/19.x.0/grid/install/root_sandpit-oracle-db_2019-07-12_11-48-15-995329600.log, /opt/app/oracle/product/19.x.0/grid/opmn/conf/ons.config.bak.sandpit-oracle-db.oracle, /opt/app/oracle/product/19.x.0/grid/opmn/conf/ons.config.sandpit-oracle-db, /opt/app/oracle/product/19.x.0/grid/opmn/conf/ons.config.sandpit-oracle-db.bak, /opt/app/oracle/product/19.x.0/grid/opmn/logs/ons.log.sandpit-oracle-db, /opt/app/oracle/product/19.x.0/grid/opmn/logs/ons.out.sandpit-oracle-db]
   ACTION: Ensure that the Oracle Grid Infrastructure home at (/opt/app/oracle/product/19.x.0/grid) includes the files listed above.

In this case the installer is complaining missing files that were suffixed with the hostname on which the gold image was created.
Similarly following output shows the similar error occurring when running an installation using a oracle home gold image. In this case the dg config files and password files have been removed as they were not relevant to the current installation.

./runInstaller -createGoldImage -destinationLocation /opt/installs/dbhome -silent -exclFiles $ORACLE_HOME/.patch_storage
Launching Oracle Database Setup Wizard...

[FATAL] [INS-35952] The installer has detected that the Oracle Database home software at (/opt/app/oracle/product/19.x.0/dbhome_1) is not complete.
   CAUSE: Following files are missing:
[/opt/app/oracle/product/19.x.0/dbhome_1/dbs/dr1sandpitm.dat, /opt/app/oracle/product/19.x.0/dbhome_1/dbs/dr2sandpitm.dat, /opt/app/oracle/product/19.x.0/dbhome_1/dbs/hc_sandpitm.dat, /opt/app/oracle/product/19.x.0/dbhome_1/dbs/init.ora, /opt/app/oracle/product/19.x.0/dbhome_1/dbs/lkSANDPITM, /opt/app/oracle/product/19.x.0/dbhome_1/dbs/orapwsandpitm, /opt/app/oracle/product/19.x.0/dbhome_1/dbs/snapcf_sandpitm.f]
   ACTION: Ensure that the Oracle Database home at (/opt/app/oracle/product/19.x.0/dbhome_1) includes the files listed above.




Reason for these errors is that a full list of all the files that are part of the image is kept in $GI_HOME/install/files.lst or $OH_HOME/install/files.lst. At run time installer validates if all of these files are present in the unzipped location.
Therefore solution to these errors is to remove from the files.lst the entries that are not relevant. This would allow installers to run without above errors.
A SR raised confirmed that it is fine to manually edit the files.lst.

Related Posts
Creating Gold Images on 18c