Friday, June 14, 2019

ORA-03186: Cannot start Oracle ADG recovery on a non-Oracle Cloud database on a server that is not a primary server

ORA-03186 occurs on a data guard configuration if the standby (or one of the standbys in a multiple standby configuration) was open for read only when the primary is restarted (or switchover happens between primary and another standby in a multiple standby configuration).
The setup is same setup used in postupgrading from 12.2 to 18c. The exact version of 18c is 18.6. All the servers are hosted on AWS. The current setup is as follows.
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database
    london  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 51 seconds ago)
When the primary is restarted
[oracle@ip-172-31-20-117 trace]$ srvctl stop database -d colombo
[oracle@ip-172-31-20-117 trace]$ srvctl start database -d colombo
redo shipping to standby stops.
Following message could be seen on the alert log of the standby instance (instance open in read only mode, london).
2019-05-29T05:38:20.414031-04:00
 rfs (PID:9340): Client is running on host ip-172-31-20-117.eu-west-1.compute.internal, not the current host ip-172-31-15-199.eu-west-1.compute.internal
2019-05-29T05:38:21.974469-04:00
 rfs (PID:9348): Client is running on host ip-172-31-20-117.eu-west-1.compute.internal, not the current host ip-172-31-15-199.eu-west-1.compute.internal
On the alert log of the primary (colombo) following could be seen.
 2019-05-29T05:38:22.283282-04:00
Errors in file /opt/app/oracle/diag/rdbms/colombo/colombo/trace/colombo_tt00_10673.trc:
ORA-03186: Cannot start Oracle ADG recovery on a non-Oracle Cloud database on a server that is not a primary server.
If the error message shown is
ORA-03816: Message 3816 not found;  product=RDBMS; facility=ORA
then apply patch 27539475 (on 18c) to get the above error message.
Inside the trace file following lines shows that attempts were made to ship redo to the instance open in read only mode.
*** 2019-05-29T05:38:22.282064-04:00
krsu_upi_status: Error 3186 attaching RFS server to standby instance at host 'londontns'
krsi_verify_network: Error 3186 attaching to LOG_ARCHIVE_DEST_2 standby host londontns
 at 0x7fffa4dc23c8 placed krsg.c@4998
ORA-03186: Cannot start Oracle ADG recovery on a non-Oracle Cloud database on a server that is not a primary server.
Data guard configuration shows error status
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database
    Error: ORA-16810: multiple errors or warnings detected for the member

    london  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 10 seconds ago)
Primary database configuration and status is shown below.
DGMGRL> show database verbose colombo

Database - colombo

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    colombo
      Error: ORA-16737: the redo transport service for member "london" has an error

  Database Warning(s):
    ORA-16629: database reports a different protection level from the protection mode

  Properties:
    DGConnectIdentifier             = 'colombotns'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'SYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '30'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '10'
    LogArchiveMinSucceedDest        = '1'
    DataGuardSyncLatency            = '0'
    DbFileNameConvert               = '/london/, /colombo/'
    LogFileNameConvert              = '/london/, /colombo/'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    PreferredObserverHosts          = ''
    HostName                        = 'ip-172-31-20-117.eu-west-1.compute.internal'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-20-117.eu-west-1.compute.internal)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=colombo_DGMGRL.domain.net)(INSTANCE_NAME=colombo)(SERVER=DEDICATED)))'
    OnlineArchiveLocation           = ''
    OnlineAlternateLocation         = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    LogArchiveTrace                 = '2049'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /opt/app/oracle/diag/rdbms/colombo/colombo/trace/alert_colombo.log
    Data Guard Broker log   : /opt/app/oracle/diag/rdbms/colombo/colombo/trace/drccolombo.log

Database Status:
ERROR
Standby database configuration
DGMGRL> show database verbose london

Database - london

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 203 seconds ago)
  Apply Lag:          0 seconds (computed 203 seconds ago)
  Average Apply Rate: 62.00 KByte/s
  Active Apply Rate:  0 Byte/s
  Maximum Apply Rate: 0 Byte/s
  Real Time Query:    ON
  Instance(s):
    london

  Database Warning(s):
    ORA-16857: member disconnected from redo source for longer than specified threshold

  Properties:
    DGConnectIdentifier             = 'londontns'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'SYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '30'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '10'
    LogArchiveMinSucceedDest        = '1'
    DataGuardSyncLatency            = '0'
    DbFileNameConvert               = '/colombo/, /london/'
    LogFileNameConvert              = '/colombo/, /london/'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    PreferredObserverHosts          = ''
    HostName                        = 'ip-172-31-15-199.eu-west-1.compute.internal'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-15-199.eu-west-1.compute.internal)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=london_DGMGRL.domain.net)(INSTANCE_NAME=london)(SERVER=DEDICATED)))'
    OnlineArchiveLocation           = ''
    OnlineAlternateLocation         = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    LogArchiveTrace                 = '2049'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /opt/app/oracle/diag/rdbms/london/london/trace/alert_london.log
    Data Guard Broker log   : /opt/app/oracle/diag/rdbms/london/london/trace/drclondon.log

Database Status:
WARNING




There are no inconsistent properties in any of the databases.
DGMGRL> show database colombo 'InconsistentProperties';
INCONSISTENT PROPERTIES
   INSTANCE_NAME        PROPERTY_NAME         MEMORY_VALUE         SPFILE_VALUE         BROKER_VALUE

DGMGRL>  show database london  'InconsistentProperties';
INCONSISTENT PROPERTIES
   INSTANCE_NAME        PROPERTY_NAME         MEMORY_VALUE         SPFILE_VALUE         BROKER_VALUE
The archive dest status on the primary shows the same error as alert log.
SQL> select dest_id id,dest_name name, status, database_mode db_mode,recovery_mode,         protection_mode,standby_logfile_count "SRLs",        standby_logfile_active ACTIVE,         archived_seq#,error  from v$archive_dest_status where dest_id=2;

        ID NAME                 STATUS    DB_MODE         RECOVERY_MODE                      PROTECTION_MODE            SRLs     ACTIVE ARCHIVED_SEQ# ERROR
---------- -------------------- --------- --------------- ---------------------------------- -------------------- ---------- ---------- ------------- ----------------------------------------
         2 LOG_ARCHIVE_DEST_2   ERROR     UNKNOWN         IDLE                               RESYNCHRONIZATION             0          0             0 ORA-03186: Cannot start Oracle ADG
                                                                                                                                                      recovery on a non-Oracle Cloud database
                                                                                                                                                      on a server that is not a primary
                                                                                                                                                      server.

To resolve the redo shipping issue, restart the instance open in read only mode.
[oracle@ip-172-31-15-199 trace]$ srvctl stop database -d london
[oracle@ip-172-31-15-199 trace]$ srvctl start database -d london
Once it is started the redo shipping begins and log archive dest error clears.
DGMGRL> show configuration

Configuration - dg12c2

  Protection Mode: MaxAvailability
  Members:
  colombo - Primary database
    london  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 11 seconds ago)

The other workaround is to put the read only instance into mount mode before the primary is stopped or switchover happens and open it in read only mode afterwards. Either method would result in clients being disconnected to from the read only instance.
The same issue was observed on a 19c multiple data guard configuration hosted on AWS.
DGMGRL> show configuration
Configuration - fcdg
Protection Mode: MaxPerformance
Members:
gold - Primary database
silver - Physical standby database
bronze - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS. 
The silver instance is in mount only mode and bronze is open read only mode. When there was a switchover to between gold and silver, redo transport to bronze stopped from the new primary (silver) with the same error as in 18c. Only way to resolve the issue was to restart the instance open in read only mode.
However, 11.2.0.4 data guard configuration hosted on AWS did not have this issue. The redo shipping was working fine across primary restarts when the standby was open in read only mode. Seems whatever issue or removal of capabilities on non-oracle clouds happened in later versions.

Update on 2019-09-24
The issue is fixed with 18.7 RU and for 19c with the 19.4 RU. For other versions apply patch 30289758 ( 30289758: MERGE ON DATABASE RU 18.5.0.0.0 OF 27539475 29430524 ) if available.