Friday, September 12, 2014

Adding Far Sync Instances to Existing Data Guard Configuration

This post list the steps of adding far sync instances to an existing data guard configuration (for both primary and standby databases, to be used when standby become primary). Far sync is an new feature introduced with 12c which allows transporting of redo data synchronously from primary DB to a "near by" far sync instance, which then transport the redo data asynchronously "over a longer distance". The idea is that there's low overhead on primary when transporting redo synchronously to a "near by" far sync instance compared to transporting "over a long distance" to achieve zero data loss as well as off-loading of the redo transport.
Oracle documentation provides a complete description of the far sync concept. Given below are few important excerpts
Many configurations have a primary database shipping redo to a standby database using asynchronous transport at the risk of some data loss at failover time. Using synchronous redo transport to achieve zero data loss may not be a viable option because of the impact on the commit response times at the primary due to network latency between the two databases. Creating a far sync instance close to the primary has the benefit of minimizing impact on commit response times to an acceptable threshold (due to the smaller network latency between primary and far sync instance) while allowing for higher data protection guarantees -- if the primary were to fail, and assuming the far sync instance was synchronized at the time of the failure, the far sync instance and the terminal standby would coordinate a final redo shipment from the far sync instance to the standby to ship any redo not yet available to the Standby and then perform a zero-data-loss failover.
A far sync instance manages a control file, receives redo into standby redo logs (SRLs), and archives those SRLs to local archived redo logs. A far sync instance does not have user data files, cannot be opened for access, cannot run redo apply, and can never function in the primary role or be converted to any type of standby database.
Far sync instances are part of the Oracle Active Data Guard Far Sync feature, which requires an Oracle Active Data Guard license.
In a configuration that contains a far sync instance, there must still be a direct network connection between the primary database and the remote standby database. The direct connection between the primary and the remote standby is used to perform health checks and switchover processing tasks. It is not used for redo transport unless the standby has been configured as an alternate destination in case the far sync instance fails and there is no alternate far sync configured to maintain the protection level.
The existing data guard configuration's primary database parameter setting and active data guard creation script of the standby database is given below. (other prerequisites for setting up data guard is omitted)
Primary database parameter changes
alter system set log_archive_config='dg_config=(ent12c1,ent12c1s)' scope=both ;
alter system set log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=ent12c1' scope=both;
alter system set log_archive_dest_2='service=ENT12C1STNS ASYNC NOAFFIRM max_failure=10 max_connections=5 reopen=180 valid_for=(online_logfiles,primary_role) db_unique_name=ent12c1s' scope=both;
alter system set log_archive_dest_state_2='defer' scope=both;
alter system set log_archive_dest_state_1='enable' scope=both;
alter system set fal_server='ENT12C1STNS' scope=both;
alter system set log_archive_max_processes=10 scope=both;
alter system set db_file_name_convert='/opt/app/oracle/oradata/ENT12C1S','/data/oradata/ENT12C1' scope=spfile;
alter system set log_file_name_convert='/opt/app/oracle/oradata/ENT12C1S','/data/oradata/ENT12C1' ,'/opt/app/oracle/fast_recovery_area/ENT12C1S','/data/flash_recovery/ENT12C1' scope=spfile;
alter system set standby_file_management='AUTO' scope=both;
Standby creation script
mkdir -p  /opt/app/oracle/oradata/ENT12C1S/controlfile
mkdir -p /opt/app/oracle/fast_recovery_area/ENT12C1S/controlfile

duplicate target database for standby from active database spfile
parameter_value_convert 'ent12c1','ent12c1s','ENT12C1','ENT12C1S','data','opt/app/oracle','flash_recovery','fast_recovery_area'
set db_unique_name='ent12c1s'
set db_create_file_dest='/opt/app/oracle/oradata'
set db_recovery_file_dest='/opt/app/oracle/fast_recovery_area'
set db_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/ENT12C1S'
set log_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/ENT12C1S','/data/flash_recovery/ENT12C1','/opt/app/oracle/fast_recovery_area/ENT12C1S'
set log_archive_max_processes='10'
set fal_server='ENT12C1TNS'
set log_archive_dest_2='service=ENT12C1TNS ASYNC NOAFFIRM max_failure=10 max_connections=5 reopen=180 valid_for=(online_logfiles,primary_role) db_unique_name=ent12c1'
set log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=ent12c1s';
This single instance data guard configuration which transport redo asynchronously is transformed as show below.
In situation 1 ENT12C1 is the primary database while ENT12C1S is the standby and FS12C1 is the far sync instance the primary database ships redo synchronously.
After a role switch when the ENT12C1S becomes the new primary then it uses FS12C1S as the far sync instance to transport redo synchronously.
In both situation there exists a direct redo transport path between primary and standby and this will transport redo asynchronously in case of far sync instance failure. Once the far sync is backup again the data guard configuration will revert to using the far sync instance for redo transport. If the standby redo logs were created on the primary then when far sync instances are in use for redo transport standby redo logs will be created automatically for them.
1. On the servers used for creating the far sync instances, install the oracle database software and create a listener. There's no requirement for creating static listener configuration as far sync instance automatically registers with the listener.
2. Create TNS entries for far sync instances on the existing databases (primary and standby) and copy the existing TNS entries into the far sync instances tnsnames.ora file.
cat tnsnames.ora

ENT12C1TNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ent12c1-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ent12c1)
    )
  )

ENT12C1STNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ent12c1s-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ent12c1)
    )
  )

FS12C1TNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = fs12c1-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = fs12c1)
    )
  )

FS12C1STNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = fs12c1s-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = fs12c1s)
    )
  )
3. Create a control file for the far sync instances by connecting to the primary database. Same control file is used for both far sync instances in this case (fs12c1 and fs12c1s).
SQL> ALTER DATABASE CREATE FAR SYNC INSTANCE CONTROLFILE AS '/home/oracle/controlfs.ctl';
4. Copy the control file to the far sync instances. In this case the control file is multiplexed and renamed as control01.ctl and control02.ctl on the far sync instances (shown on the pfiles in subsequent steps).
scp controlfs.ctl fs12c1-host:/opt/app/oracle/oradata/FS12C1/controlfile/controlfs01.ctl
scp controlfs.ctl fs12c1s-host:/opt/app/oracle/oradata/FS12C1/controlfile/controlfs01.ctl
Similarly copy the password file from the primary to $ORACLE_HOME/dbs on the servers where far sync instances will be created. For far sync instances fs12c1 and fs12c1s the password file need to be renamed as orapwfs12c1 and orapwfs12c1s respectively.

5. Create pfile from the primary spfile. This will be modified to reflect the far sync instance settings.
SQL> create pfile='/home/oracle/pfilefs.ora' from spfile;
6. Copy the pfile to far sync instances ($ORACLE_HOME/dbs) and rename them to reflect the instance names (eg. initfs12c1.ora and initfs12c1s.ora). Modify the init file used for the primary far sync instance (fs12c1) as shown below. Not all the parameters are needed for far sync and those could be removed from the pfile.
cat initfs12c1.ora

*.audit_file_dest='/opt/app/oracle/admin/fs12c1/adump'
*.audit_trail='OS'
*.compatible='12.1.0.2'
*.control_files='/opt/app/oracle/oradata/FS12C1/controlfile/controlfs01.ctl','/opt/app/oracle/oradata/FS12C1/controlfile/controlfs02.ctl'
*.db_block_size=8192
*.db_create_file_dest='/opt/app/oracle/oradata'
*.db_name='ent12c1'
*.db_recovery_file_dest_size=21474836480
*.db_recovery_file_dest='/opt/app/oracle/fast_recovery_area'
*.db_unique_name='fs12c1'
*.diagnostic_dest='/opt/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=fs12c1XDB)'
*.fal_server='ENT12C1TNS'
*.log_archive_config='dg_config=(ent12c1,ent12c1s,fs12c1,fs12c1s)'
*.log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=fs12c1'
*.log_archive_dest_2='service=ENT12C1STNS ASYNC NOAFFIRM valid_for=(STANDBY_LOGFILES,standby_role) db_unique_name=ent12c1s max_failure=10 max_connections=5 reopen=180'
*.log_archive_dest_state_1='enable'
*.log_archive_dest_state_2='enable'
*.log_archive_format='%t_%s_%r.dbf'
*.log_archive_max_processes=10
*.log_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/FS12C1','/data/flash_recovery/ENT12C1','/opt/app/oracle/fast_recovery_area/FS12C1'
*.remote_login_passwordfile='EXCLUSIVE'
*.standby_file_management='AUTO'
7. Mount the far sync instance using the pfile and then create a spfile from the pfile. Without the spfile a warning is shown when the far sync is added to a data guad broker configuration. Beside that having a spfile also helps with any subsequent parameter changes without the need to restart the far sync instance. Restart (mount) the far sync instance using the spfile.



8. Similarly create the pfile for the far sync instance used by current standby (when it becomes the primary) FS12C1S.
cat initfs12c1s.ora

*.audit_file_dest='/opt/app/oracle/admin/fs12c1s/adump'
*.audit_trail='OS'
*.compatible='12.1.0.2'
*.control_files='/opt/app/oracle/oradata/FS12C1S/controlfile/control01.ctl','/opt/app/oracle/fast_recovery_area/FS12C1S/controlfile/control02.ctl'
*.db_create_file_dest='/opt/app/oracle/oradata'
*.db_name='ent12c1'
*.db_recovery_file_dest_size=21474836480
*.db_recovery_file_dest='/opt/app/oracle/fast_recovery_area'
*.db_unique_name='fs12c1s'
*.diagnostic_dest='/opt/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=fs12c1sXDB)'
*.fal_server='ENT12C1STNS'
*.log_archive_config='dg_config=(ent12c1,ent12c1s,fs12c1,fs12c1s)'
*.log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=fs12c1s'
*.log_archive_dest_2='service=ENT12C1TNS ASYNC NOAFFIRM valid_for=(standby_logfiles,standby_role) db_unique_name=ent12c1 max_failure=10 max_connections=5 reopen=180'
*.log_archive_dest_state_1='enable'
*.log_archive_dest_state_2='enable'
*.log_archive_format='%t_%s_%r.dbf'
*.log_archive_max_processes=10
*.log_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/FS12C1S','/data/flash_recovery/ENT12C1','/opt/app/oracle/fast_recovery_area/FS12C1S'
*.remote_login_passwordfile='EXCLUSIVE'
*.standby_file_management='AUTO'
9. Similar to earlier (step 7) create a spfile and restart (mount) the far sync instance using the spfile.

10. Update the log archive config parameter on both primary and standby to include the far sync instance information as well.
alter system set log_archive_config='dg_config=(ent12c1,ent12c1s,fs12c1,fs12c1s)' scope=both ;
11. Update the fal server parameter on the primary (ent12c1) as below which allows the ent12c1 to fetch archive logs (when it becomes a standby) either from the primary (ent12c1s) or from the far sync instance (fs12c1s).
alter system set fal_server='ENT12C1STNS','FS12C1STNS' scope=both;
12. Update the fal server parameter on the standby (ent12c1s) so that it can fetch the archive logs either from the ent12c1 (primary) or far sync instance (fs12c1).
alter system set fal_server='ENT12C1TNS','FS12C1TNS' scope=both;
13. Update the log archive destination and log archive destination state on the primary such that redo transport is synchronized between primary and far sync and asynchronous between the primary and standby (direct). Further more the asynchronous log archive destination is set with state alternate so that when the synchronous log archive destination fails the data guard configuration start shipping redo via this alternate log archive destination.
alter system set log_archive_dest_state_2='enable' scope=both;
alter system set log_archive_dest_2='service=FS12C1TNS SYNC AFFIRM  db_unique_name=fs12c1  max_failure=1 valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_3 max_connections=5' scope=both;

alter system set log_archive_dest_state_3='alternate' scope=both;
alter system set log_archive_dest_3='service=ENT12C1STNS ASYNC NOAFFIRM  db_unique_name=ent12c1s  valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_2 max_failure=10 max_connections=5 reopen=180' scope=both;
14. Change and add log archive destination settings on the standby database so that when it becomes primary (ent12c1s) it too can use the far sync instance for synchronous redo transport and failing that use asynchronous redo transport directly with the standby at the time (ent12c1)
alter system set log_archive_dest_state_2='enable' scope=both;
alter system set log_archive_dest_2='service=FS12C1STNS SYNC AFFIRM db_unique_name=fs12c1s valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_3 max_connections=5  max_failure=1' scope=both;

alter system set log_archive_dest_state_3='alternate' scope=both;
alter system set log_archive_dest_3='service=ENT12C1TNS ASYNC NOAFFIRM  db_unique_name=ent12c1 valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_2 max_failure=10 max_connections=5 reopen=180' scope=both;
15. Increase the protection mode of the primary database to maximum availability.
ALTER DATABASE SET STANDBY TO MAXIMIZE AVAILABILITY;

SQL> SELECT PROTECTION_MODE FROM V$DATABASE;

PROTECTION_MODE
--------------------
MAXIMUM AVAILABILITY
16. Do few log switches and verify that redo transport is happening via the far sync instance. Easiest way to monitor is through the alert log, which log switch will be logged on both far sync instance alert log and standby instance alert log if it was transported via the far sync. If there's any issues with regard to log archive destination this could be observed on the primary
SQL> SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             VALID     FS12C1TNS
LOG_ARCHIVE_DEST_3             ALTERNATE ENT12C1STNS
and on the far sync instance
SQL> SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             VALID     ENT12C1STNS
STANDBY_ARCHIVE_DEST           VALID     USE_DB_RECOVERY_FILE_DEST
Above output shows that primary is transporting to log archive dest 2 and status valid and dest 3 is still in alternative state. On the far sync output it shows that far sync instance is shipping redo as per its log archive dest 2 value and current status is valid.

17. Shutdown abort the far instance and view the archive dest output. If the configuration works properly then log archive dest 3 should be valid and redo transport should be happening directory between primary and standby in asynchronous mode.
SQL> SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             ALTERNATE FS12C1TNS
LOG_ARCHIVE_DEST_3             VALID     ENT12C1STNS
From the above output it could be seen that after far sync instance is terminated the log archive dest 3 has become the valid destination and log archive dest 2 is kept as an alternate destination. On the primary instance alert log following could be observed when the far sync instance is terminated
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 1 thread 1 sequence 1503 (3113)
Destination LOG_ARCHIVE_DEST_2 no longer supports SYNCHRONIZATION
and on the standby instance alert log the following
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM PERFORMANCE mode
It must be also mentioned on few occasions where the far sync was abruptly terminated (shutdown abort) the recovery process on standby got stopped due to lost writes
MRP0: Background Media Recovery terminated with error 742  <-- far sync instance terminated
Fri Sep 12 17:32:28 2014
Errors in file /opt/app/oracle/diag/rdbms/ent12c1s/ent12c1s/trace/ent12c1s_pr00_10098.trc:
ORA-00742: Log read detects lost write in thread 1 sequence 1503 block 868
ORA-00312: online log 4 thread 1: '/opt/app/oracle/fast_recovery_area/ENT12C1S/onlinelog/o1_mf_4_b0y1dn8v_.log'
ORA-00312: online log 4 thread 1: '/opt/app/oracle/oradata/ENT12C1S/onlinelog/o1_mf_4_b0y1dml4_.log'
Managed Standby Recovery not using Real Time Apply
RFS[16]: Assigned to RFS process (PID:10165)
RFS[16]: Selected log 5 for thread 1 sequence 1504 dbid 209099011 branch 833730501
Fri Sep 12 17:32:28 2014
Recovery interrupted!
Recovered data files to a consistent state at change 19573793
Fri Sep 12 17:32:28 2014
Errors in file /opt/app/oracle/diag/rdbms/ent12c1s/ent12c1s/trace/ent12c1s_pr00_10098.trc:
ORA-00742: Log read detects lost write in thread 1 sequence 1503 block 868
ORA-00312: online log 4 thread 1: '/opt/app/oracle/fast_recovery_area/ENT12C1S/onlinelog/o1_mf_4_b0y1dn8v_.log'
ORA-00312: online log 4 thread 1: '/opt/app/oracle/oradata/ENT12C1S/onlinelog/o1_mf_4_b0y1dml4_.log'
Fri Sep 12 17:32:28 2014
MRP0: Background Media Recovery process shutdown (ent12c1s)
Fri Sep 12 17:32:28 2014
Archived Log entry 107 added for thread 1 sequence 1503 rlc 833730501 ID 0xde697d0 dest 3:
Fri Sep 12 17:38:51 2014
alter database recover managed standby database disconnect  <-- manual start at 17:38 after it was stopped 17:32
So it maybe good idea to keep an eye on the recovery when the far sync instance terminates. According to 1302539.1 when active data guard is in place the there's automatic block repair transparent to the user. Once the far sync instance is backup again the redo transport will go back to original setting
SQL>  SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             VALID     FS12C1TNS
LOG_ARCHIVE_DEST_3             ALTERNATE ENT12C1STNS
And following could be observed on the alert log of primary
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 1506
LGWR: Standby redo logfile selected for thread 1 sequence 1506 for destination LOG_ARCHIVE_DEST_2
Fri Sep 12 16:21:08 2014
Thread 1 advanced to log sequence 1506 (LGWR switch)
  Current log# 1 seq# 1506 mem# 0: /data/oradata/ENT12C1/onlinelog/o1_mf_1_9bcsl3ds_.log
  Current log# 1 seq# 1506 mem# 1: /data/flash_recovery/ENT12C1/onlinelog/o1_mf_1_9bcsl3hm_.log
Fri Sep 12 16:21:08 2014
Archived Log entry 1408 added for thread 1 sequence 1505 ID 0xde697d0 dest 1:
Fri Sep 12 16:21:11 2014
Thread 1 cannot allocate new log, sequence 1507
Checkpoint not complete
  Current log# 1 seq# 1506 mem# 0: /data/oradata/ENT12C1/onlinelog/o1_mf_1_9bcsl3ds_.log
  Current log# 1 seq# 1506 mem# 1: /data/flash_recovery/ENT12C1/onlinelog/o1_mf_1_9bcsl3hm_.log
Fri Sep 12 16:21:14 2014
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
On the alert log of the far sync instance
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Standby controlfile consistent with primary
RFS[1]: Assigned to RFS process (PID:3557)
RFS[1]: Selected log 5 for thread 1 sequence 1506 dbid 209099011 branch 833730501
Fri Sep 12 17:04:06 2014
******************************************************************
TT00: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
TT00: Standby redo logfile selected for thread 1 sequence 1506 for destination LOG_ARCHIVE_DEST_2
Fri Sep 12 17:04:07 2014
Changing standby controlfile to MAXIMUM PERFORMANCE mode
RFS[2]: Assigned to RFS process (PID:3561)
RFS[2]: Opened log for thread 1 sequence 1505 dbid 209099011 branch 833730501
Fri Sep 12 17:04:08 2014
Archived Log entry 203 added for thread 1 sequence 1505 rlc 833730501 ID 0xde697d0 dest 2:
Fri Sep 12 17:04:13 2014
Changing standby controlfile to MAXIMUM AVAILABILITY mode
On the standby alert log
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Similarly it is possible to shutdown the standby instance and see if primary is able to ship redo to far sync instance and if the redo is fetch by standby once it is started again. With these test the situation 1 on the figure shown at the beginning of the post is complete.

18. To test the situation 2 on the figure above do a switchover and check the redo transport via the far sync (fs12c1s) instance.

This conclude adding of far sync instances to existing data guard configuration on 12c.

Useful metalink notes
Cascaded Standby Databases in Oracle 12c [ID 1542969.1]
Data Guard 12c New Feature: Far Sync Standby [ID 1565071.1]
Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration [ID 1302539.1]

Tuesday, September 2, 2014

LOB Chunk and Tablespace Block Size

Chunk value corresponds to the data size used by oracle when reading or writing a lob value. Once set chunk size cannot be changed. Though it doesn't matter for lobs stored in row, for out of row lobs the space is used in multiples of the chunk size.
From Oracle documentation (for basicfile lobs) A chunk is one or more Oracle blocks. You can specify the chunk size for the BasicFiles LOB when creating the table that contains the LOB. This corresponds to the data size used by Oracle Database when accessing or modifying the LOB value. Part of the chunk is used to store system-related information and the rest stores the LOB value. If the tablespace block size is the same as the database block size, then CHUNK is also a multiple of the database block size. The default CHUNK size is equal to the size of one tablespace block, and the maximum value is 32K. Once the value of CHUNK is chosen (when the LOB column is created), it cannot be changed. Hence, it is important that you choose a value which optimizes your storage and performance requirements.
For securefile CHUNK is an advisory size and is provided for backward compatibility purposes.
From performance perspective it is considered that accessing lobs in big chunks is more efficient. You can set CHUNK to the data size most frequently accessed or written. For example, if only one block of LOB data is accessed at a time, then set CHUNK to the size of one block. If you have big LOBs, and read or write big amounts of data, then choose a large value for CHUNK.
This post shows the result of a test case carried out to compare the performance benefits of using a large chunk size along with a tablespace with a large block size (8k vs 32k). The blob used for the test case is 800KB. The java code used for the test case is given at the end of the post. For each chunk size (8k vs 32k) the caching option was also changed (nocache vs cache) as they also have direct impact on the IO usage. The lob is stored out of row in a separate tablespace than the table.
The table creation DDL used for basicfile test is shown below (comment and uncomment each option based on the test case). The LOB32KTBS is a tablespace of 32k block size while LOB8KTBS is a tabelspace of 8k block size.
create table ses(sesid varchar2(100), sesob blob) SEGMENT CREATION IMMEDIATE TABLESPACE users
 LOB
  (
    sesob
  )
  STORE AS object_lob_seg (
  TABLESPACE LOB32KTBS
  --TABLESPACE LOB8KTBS
  DISABLE STORAGE IN ROW
  CHUNK 32K
  --CHUNK 8K
                CACHE
  --NOCACHE
  PCTVERSION 0
        STORAGE (MAXEXTENTS UNLIMITED)
        INDEX object_lob_idx (
            TABLESPACE LOB32KTBS
     --TABLESPACE LOB8KTBS
            STORAGE (MAXEXTENTS UNLIMITED)
        )
    )
/
Test cases comprised of reading a lob column for a row and inserting lobs. The IO related statistics comparison for select test case is given below. Based on the graphs it could be seen that 32K chunk size on a tablespace with a block size of 32K requires less number of logical or physical reads compared to having a 8k chunk and lob segment on a 8k block size tablespace. Though not shown on the graphs, on a separate test where using a 32k chunk size and placing the lob segment on a 8K block size tablespace had the same performance characteristics of having a 8k chunk on a 8k block size tablespace. On the other hand having a chunk of 8k and placing the lob segment on a 32k block size tablespace had the same performance characteristics of having a 32k chunk on a 32k block size tablespace. This means that chunk size alone is not going to reduce the amount of IO but the tablespace block size where the lob segment is stored has an influence as well.

The next test was the inserting of lob. The results are shown on the following two graphs. Similar to the read test, having a large chunk size and tablespace block size for lob reduces the IO.




The same test was carried out for securefile lob segments. The table creation DDL is given below. Only difference to the DDL compared to basicfile is the "retention none". All other parameters/options are the same.
create table ses(sesid varchar2(100), sesob blob) SEGMENT CREATION IMMEDIATE TABLESPACE users
 LOB
  (
    sesob
  )
  STORE AS  securefile object_lob_seg (
  TABLESPACE LOB32KTBS
  --TABLESPACE LOB8KTBS
  DISABLE STORAGE IN ROW
  CHUNK 32K
  --CHUNK 8K
                CACHE
  --NOCACHE
  RETENTION NONE
        STORAGE (MAXEXTENTS UNLIMITED)
        INDEX object_lob_idx (
            TABLESPACE LOB32KTBS
   --TABLESPACE LOB8KTBS
            STORAGE (MAXEXTENTS UNLIMITED)
        )
    )
/
The results of the select test is shown on the graphs below. Similar to basicfile the larger chunk size advisory and tablespace block size combination out perform the smaller chunk/block size combination. In all cases the securefile out perform basicfile for the amount of logical or physical reads.

The outcome for the insert test also same as that of basicfile insert test where larger chunk/block size combination out performs the smaller chunk/block size combination. Also between basicfile and secfile the secfile out performs the basicfile.

This tests have shown that it's better to use large chunk/tablespace block sizes for larger LOBs to reduce logical/physical IO related to LOBs.

Useful White papers
SecureFile Performance
Oracle 11g: SecureFiles

Related Post
Nologging vs Logging for LOBs and enq: CF - contention

Java code used for the test. For the code of LobStat class refer the earlier post
public class LobChunkTest {

    final String URL = "jdbc:oracle:thin:@192.168.0.66:1521:ent11g2";
    final String USERNAME = "asanga";
    final String PASSWORD = "asa";

    public static void main(String[] args) {

        LobChunkTest test = new LobChunkTest();
        //Insert test
        test.insertTest();

        System.out.println("\n\n************* end of insert test **************\n\n");

        //select test
        test.selectTest();

    }

    public void insertTest() {
        try {

            OracleDataSource pool = new OracleDataSource();
            pool.setURL(URL);
            pool.setUser(USERNAME);
            pool.setPassword(PASSWORD);

            Connection con = pool.getConnection();
            con.setAutoCommit(false);

            long t1 = System.currentTimeMillis();

            LobStat.displayStats(con);

            byte[] x = new byte[800 * 1024];
            x[1] = 10;
            x[798 * 1024] = 20;

            for (int i = 0; i < 100; i++) {

                OraclePreparedStatement pr = (OraclePreparedStatement) con.prepareStatement("insert into ses values(?,?)");

                String sesid = "abcdefghijklmnopqrstuvwxy" + Math.random();
                pr.setString(1, sesid);
                pr.setBytes(2, x);

                pr.execute();
                con.commit();
                pr.close();

            }

            long t2 = System.currentTimeMillis();

            LobStat.displayStats(con);

            con.close();

            System.out.println("time taken " + (t2 - t1));

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

    public  void selectTest() {
        try {

            OracleDataSource pool = new OracleDataSource();
            pool.setURL("jdbc:oracle:thin:@192.168.0.66:1521:ent11g2");
            pool.setUser("asanga");
            pool.setPassword("asa");

            Connection con = pool.getConnection();
            con.setAutoCommit(false);

            String[] sesids = new String[100];

            PreparedStatement pr1 = con.prepareStatement("select sesid from ses");
            ResultSet rs1 = pr1.executeQuery();
            int i = 0;
            while (rs1.next()) {

                sesids[i] = rs1.getString(1);

                i++;

            }
            rs1.close();
            pr1.close();
            con.close();

            con = pool.getConnection();
            LobStat.displayStats(con);

            OraclePreparedStatement pr = (OraclePreparedStatement) con.prepareStatement("select SESOB from ses where sesid=?");

            long t1 = System.currentTimeMillis();
            for (String x : sesids) {


                pr.setString(1, x);
                ResultSet rs = pr.executeQuery();

                while (rs.next()) {

                    byte[] blob = rs.getBytes(1);

                }

                rs.close();
            }

            long t2 = System.currentTimeMillis();
            System.out.println("time taken " + (t2 - t1));

            LobStat.displayStats(con);

            pr.close();

            con.close();

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

Tuesday, August 19, 2014

Nologging vs Logging for LOBs and enq: CF - contention

Changing the logging option is one of the possible performance tuning tasks when dealing with LOBs. However use of nologging will make recovery of these lob segments impossible as such use of this will also depends on the nature of the application data. For transient data where recovery is not expected in case of database failure nologging option would be suitable. When the logging option is not explicitly mentioned this results in the tablespace's logging option being inherited by the lob segment. Refer Oracle documentation for more logging related information on basicfile and securefile.
This post presents the result of a simple test carried out comparing the nologging vs logging for both basicfile and securefile LOB segments. The java code used for the test is given at the end of the post. First case is the basicfile. Nologging is not possible if the cache is enable on the lob segment. Therefore nocache option is chosen for both logging and nologging. The table definition is given below.
create table ses(sesid varchar2(100), sesob blob) SEGMENT CREATION IMMEDIATE TABLESPACE users
 LOB
  (
    sesob
  )
  STORE AS object_lob_seg (
  TABLESPACE lob32ktbs
  DISABLE STORAGE IN ROW
        CHUNK 32K
        NOCACHE NOLOGGING
        --NOCACHE LOGGING
        PCTVERSION 0
        STORAGE (MAXEXTENTS UNLIMITED)
        INDEX object_lob_idx (
            TABLESPACE lob32ktbs
            STORAGE (MAXEXTENTS UNLIMITED)
        )
    )
/
The test involves inserting an 800KB LOB into the table 100 times and then later updating it with similar size LOB. Since the LOB is greater than 4K, the table is created with disable in row. Chunk size is set to 32k and the lob segment is stored in a tablespace of 32K block size (lob32ktbs) while the table resides in a different (users) tablespace of 8K block size. Pctversion is set to 0 as no consistent reads are expected. Only option that is changed is the logging option. The test database version is 11.2.0.3
Redo size statistic and the log file sync wait event times are compared for the two test runs as these are directly related to the logging option. Graphs below show the comparison of these for each test run.

There's no surprise that nologging option generates the lowest amount of redo. The logging test case generated around 83-84MB of redo for insert and update which is roughly the same size as the LOB inserted/updated (800KB x 100). There's minimal logging during the nologging test. Since redo is counted for the entire test, the redo seen could be the redo generated for the table data insert (as oppose to lobs) which still is on a tablespace with logging. Nevertheless a clear difference could be observed in the amount of redo generated when nologging is used. This is also reflected in the log file sync time for the two test cases which got reduced from several minutes to under a minute.
Next the same test was executed but this time the lob segment was stored as securefile. The table DDL is given below. Only difference apart from the securefile is Pctversion 0 has been replaced with retention none. All other settings are same as the basic file (tablespaces, database and etc). Chunk size is depreciated in securefile, and when specified is considered only as a guidance.
create table ses(sesid varchar2(100), sesob blob) SEGMENT CREATION IMMEDIATE TABLESPACE users
 LOB
  (
    sesob
  )
  STORE AS securefile object_lob_seg (
  TABLESPACE lob32ktbs
  DISABLE STORAGE IN ROW
        CHUNK 32K
        NOCACHE NOLOGGING
        --NOCACHE LOGGING
        RETENTION NONE
        STORAGE (MAXEXTENTS UNLIMITED)
        INDEX object_lob_idx (
            TABLESPACE lob32ktbs
            STORAGE (MAXEXTENTS UNLIMITED)
        )
    )
/
Similar to the earlier test the nologging generated low amount of redo compared to logging and resulted in short wait on log file sync event.

Comparing redo size and log file sync time for nologging option between the basicfile and securefile shows a mix bag of results. Basicfile has performed well for inserts in reducing redo generated and log file sync while securefile performed well for updates.

Comparing the IO types on OEM console during the nologging test it was noted that securefile uses predominantly large writes while the basicfile uses small writes.
Comparing the test result with logging enabled the securefile out performance basic file for insert and updates in terms of log file sync wait time. Both securefile and basicfile generates roughly the same amount of redo. It must be noted nocache logging is the default for secure file.
The table below shows all the test results.




It seems that when application permits it's best to use nologging for lobs which reduce the amount of redo generated and log file sync waits. However there are some drawbacks to using nologging on LOBs which only comes to light when there are multiple sessions doing LOB related DMLS. Following is an AWR snippet from a load test on pre-production system.
After CPU the top wait event is log file sync and high portion of this wait event is due an basicfile LOB related insert statement that store some transient data. Changing the lob segment option to nologging resulted in lower log file sync time but it also introduced high enq: CF - contention wait times.
According to 1072417.1 enq: CF - contention is normal and expected when doing DML on LOBs with nocache nologging. CF contention occurs as oracle records the unrecoverable system change number (SCN) in the control file. From the application perspective the overall response time degraded after changing to nologging and had to be reverted back to cache logging.

Useful metalink notes
Performance Degradation as a Result of 'enq: CF - contention' [ID 1072417.1]
LOB Performance Guideline [ID 268476.1]
LOBS - Storage, Redo and Performance Issues [ID 66431.1]
LOBS - Storage, Read-consistency and Rollback [ID 162345.1]
Master Note - RDBMS Large Objects (LOBs) [ID 1268771.1]
Performance problems on a table that has hundreds of columns including LOBs [ID 1292685.1]
POOR PERFORMANCE WITH LOB INSERTS [ID 978045.1]
Securefiles Performance Appears Slower Than Basicfile LOB [ID 1323933.1]


Java Code Used for Testing
public class LobLoggingTest {

    final String URL = "jdbc:oracle:thin:@192.168.0.66:1521:ent11g2";
    final String USERNAME = "asanga";
    final String PASSWORD = "asa";

    public static void main(String[] args) {

        LobLoggingTest test = new LobLoggingTest();
        //Insert test
        test.insertTest();

        System.out.println("\n\n************* end of insert test **************\n\n");

        //Update test
        test.updateTest();

    }

    public void insertTest() {
        try {

            OracleDataSource pool = new OracleDataSource();
            pool.setURL(URL);
            pool.setUser(USERNAME);
            pool.setPassword(PASSWORD);

            Connection con = pool.getConnection();
            con.setAutoCommit(false);

            long t1 = System.currentTimeMillis();

            LobStat.displayStats(con);
            LobStat.displayWaits(con);

            byte[] x = new byte[800 * 1024];
            x[1] = 10;
            x[798 * 1024] = 20;

            for (int i = 0; i < 100; i++) {

                OraclePreparedStatement pr = (OraclePreparedStatement) con.prepareStatement("insert into ses values(?,?)");

                String sesid = "abcdefghijklmnopqrstuvwxy" + Math.random();
                pr.setString(1, sesid);
                pr.setBytes(2, x);

                pr.execute();
                con.commit();
                pr.close();

            }

            long t2 = System.currentTimeMillis();

            LobStat.displayStats(con);
            LobStat.displayWaits(con);

            con.close();

            System.out.println("time taken " + (t2 - t1));

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

public void updateTest() {
        try {

            OracleDataSource pool = new OracleDataSource();
            pool.setURL(URL);
            pool.setUser(USERNAME);
            pool.setPassword(PASSWORD);

            Connection con = pool.getConnection();
            con.setAutoCommit(false);

            String[] sesids = new String[100];

            PreparedStatement pr1 = con.prepareStatement("select sesid from ses");
            ResultSet rs1 = pr1.executeQuery();
            int i = 0;
            while (rs1.next()) {

                sesids[i] = rs1.getString(1);
                i++;
            }
            rs1.close();
            pr1.close();
            con.close();

            con = pool.getConnection();
            LobStat.displayStats(con);
            LobStat.displayWaits(con);

            OraclePreparedStatement pr = (OraclePreparedStatement) con.prepareStatement("update ses set SESOB=? where sesid=?");

            byte[] xx = new byte[800 * 1024];
            xx[1] = 10;
            xx[798 * 1024] = 20;

            long t1 = System.currentTimeMillis();
            for (String x : sesids) {

                pr.setBytes(1, xx);
                pr.setString(2, x);
                pr.execute();
                con.commit();
            }

            long t2 = System.currentTimeMillis();
            System.out.println("time taken " + (t2 - t1));

            LobStat.displayStats(con);
            LobStat.displayWaits(con);

            pr.close();
            con.close();

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}


public class LobStat {

    public static void displayStats(Connection con) {
        try {
            PreparedStatement pr = con.prepareStatement("select name,value from v$mystat,v$statname where v$mystat.statistic#=v$statname.statistic# "
                    + " and v$statname.name in ('CPU used when call started',"
                    + " 'CPU used by this session','db block gets','db block gets from cache','db block gets from cache (fastpath)',"
                    + " 'db block gets direct','consistent gets','consistent gets from cache','consistent gets from cache (fastpath)',"
                    + " 'consistent gets - examination','consistent gets direct','physical reads','physical reads direct',"
                    + " 'physical read IO requests','physical read bytes','consistent changes','redo size','redo writes',"
                    + " 'lob writes','lob writes unaligned','physical writes direct (lob)','physical writes','physical writes direct','physical writes from cache','physical writes direct temporary tablespace'"
                    + " ,'physical writes direct temporary tablespace','securefile direct read bytes','securefile direct write bytes','securefile direct read ops'"
                    + " ,'securefile direct write ops') order by 1");

            ResultSet rs = pr.executeQuery();


            while(rs.next()){

                System.out.println(rs.getString(1)+" : "+rs.getDouble(2));
            }

            rs.close();
            pr.close();
        } catch (SQLException ex) {
            ex.printStackTrace();
        }

    }

    public static void displayWaits(Connection con){
        try {
            PreparedStatement pr = con.prepareStatement("select event,total_waits,TIME_WAITED_MICRO from  V$SESSION_EVENT where sid=SYS_CONTEXT ('USERENV', 'SID') and event in ('log file sync','enq: CF - contention')");
            ResultSet rs = pr.executeQuery();

            System.out.println("event : total waits : time waited micro");
            while(rs.next()){

                System.out.println(rs.getString(1)+" : "+rs.getLong(2)+" : "+rs.getLong(3));

            }

            rs.close();
            pr.close();
        } catch (SQLException ex) {
            ex.printStackTrace();
        }

    }
}