Tuesday, October 20, 2015

12c Compression Comparison

OLTP Table compression introduced during 11g is called Advanced Row Compression in 12c. However the old syntax still works on 12c as well. Advance row compression maintains compression during all types of data manipulation operations, including conventional DML such as INSERT and UPDATE. Oracle's claim is that this minimizes the overhead for write operation of and making it suitable for OLTP and Data Warehouses. Advanced Row Compression requires purchasing Oracle Advanced Compression option while basic table compression is a feature of Oracle Database 12c Enterprise Edition (EE).
This post compares the reduction of storage usage, overhead of CPU and redo generated for advance vs basic vs no compression options. It is not an extensive test, how useful the compression depends on data being compressed.
As said earlier compress for OLTP creates the table with advance row compression.
create table x (a number, b varchar2(100)) compress for oltp;
select table_name,COMPRESSION,COMPRESS_FOR from  user_tables where table_name='X';

TABLE_NAME COMPRESS COMPRESS_FOR
---------- -------- ------------------------------
X          ENABLED  ADVANCED

create table x (a number, b varchar2(100)) ROW STORE COMPRESS ADVANCED;
select table_name,COMPRESSION,COMPRESS_FOR from  user_tables where table_name='X';

TABLE_NAME COMPRESS COMPRESS_FOR
---------- -------- ------------------------------
X          ENABLED  ADVANCED
If no option is specified this would create basic compression or it could be explicitly specified.
create table x (a number, b varchar2(100)) compress;
SQL> select table_name,COMPRESSION,COMPRESS_FOR from  user_tables where table_name='X';

TABLE_NAME COMPRESS COMPRESS_FOR
---------- -------- ------------------------------
X          ENABLED  BASIC

create table x (a number, b varchar2(100)) ROW STORE COMPRESS basic;
select table_name,COMPRESSION,COMPRESS_FOR from  user_tables where table_name='X';

TABLE_NAME COMPRESS COMPRESS_FOR
---------- -------- ------------------------------
X          ENABLED  BASIC
For the test case three tables were created with no compression, advance and basic compression. The database is 12.1.0.2.
create table NOCOMPRESSTABLE(a number, b varchar2(100));
create table ADVCOMPRESSTABLE(a number, b varchar2(100)) ROW STORE COMPRESS ADVANCED;
create table BASICVCOMPRESSTABLE(a number, b varchar2(100)) ROW STORE COMPRESS basic;

SQL> select table_name,COMPRESSION,COMPRESS_FOR from  user_tables where table_name like '%COMPRESSTABLE';

TABLE_NAME           COMPRESS COMPRESS_FOR
-------------------- -------- -------------
NOCOMPRESSTABLE      DISABLED
BASICVCOMPRESSTABLE  ENABLED  BASIC
ADVCOMPRESSTABLE     ENABLED  ADVANCED
Each table consists of two columns and was populated using the following anonymous PL/SQL block. It inserts a unique value to first column while second column get the same value inserted with each row.
begin
    for i in 1 .. 1000000
    loop
    insert into NOCOMPRESSTABLE values (i,'asdfghjkllqwertyuiopzxcvbnm134567890');
    insert into ADVCOMPRESSTABLE values(i,'asdfghjkllqwertyuiopzxcvbnm134567890');
    insert into BASICVCOMPRESSTABLE values(i,'asdfghjkllqwertyuiopzxcvbnm134567890');
    commit;
    end loop;
end;
/
At the end of the insert the segment size of each table was measured
select segment_name,bytes/1024/1024 as MB from user_segments where segment_name like '%COMPRESS%';

SEGMENT_NAME          MB
--------------------- ---
ADVCOMPRESSTABLE      14
BASICVCOMPRESSTABLE   47
NOCOMPRESSTABLE       52
Results are no surprise as the segment of the table created with advance compression is the smallest and table with no compression is the largest. However this test is the most optimistic of cases where one column consists of only a single value.





Therefore the test was rerun after recreating the tables and inserts were modified as below where some of the values inserted to second column are duplicates but not all of them
begin
    for i in 1 .. 1000000
    loop
--    insert into NOCOMPRESSTABLE values (i,'asdfghjkllqwertyuiopzxcvbnm134567890'||mod(i,500));
    insert into ADVCOMPRESSTABLE values(i,'asdfghjkllqwertyuiopzxcvbnm134567890'||mod(i,500));
--    insert into BASICVCOMPRESSTABLE values(i,'asdfghjkllqwertyuiopzxcvbnm134567890'||mod(i,500));
    commit;
    end loop;
    end;
    /
Only one table was inserted at a time and CPU used by session and redo size were measured for each test, shown on the graphs below.
Strangely the inserts done with compress options used less CPU than when inserted without any compression (test were repeated 3 times and same pattern was observed). However the difference wasn't huge. On the other hand use of advance compression option resulted in more redo being created than basic compression or no compression.
Comparing the segment size revealed the following.
select segment_name,bytes/1024/1024 as MB from user_segments where segment_name like '%COMPRESS%';

SEGMENT_NAME          MB
--------------------- ---
ADVCOMPRESSTABLE      55
BASICVCOMPRESSTABLE   50
NOCOMPRESSTABLE       55
It seems basic compression is slightly better than no compression option but advance compression didn't yield any benefit at all. Reason could be that advance compression didn't encounter enough duplicate values inside a block to make any significant reduction. It's important to remember that compression happens at block level.
Test was rerun this time however the tables were created on a tablespace with a block size of 32k. The segment size at the end of the test are as below
select segment_name,bytes/1024/1024 as MB from user_segments where segment_name like '%COMPRESS%';

SEGMENT_NAME          MB
--------------------- ---
ADVCOMPRESSTABLE      49
BASICVCOMPRESSTABLE   49
NOCOMPRESSTABLE       54
This time there's difference between segment size whose tables were created with a compression options and table created without any compression. However the reduction in segment size is not same as when all the values were duplicates.
From this simple test case it seems, to benefit from a compression strategy, the nature of the data is more important than the compression option being used. The reduction in space is dictated by how many duplicate values are found inside a block.