Wednesday, December 15, 2021

Loading Data to ATP from Object Storage

There are serveal ways to load data to ATP from object storage. This post shows examples of few of those options.

Copying data from object storage to a table in ATP
1. The copy_data method in dbms_cloud package is used to achieve this. Before the data could be copied over the required policies, permissions and credentails should be craeted. The necessary permissions and credentails creation steps are same as the ones needed to export/import to/from object storage. Implement step 1 to 8 in the earlier post. Even though not directly used in the package, access to data_pump_dir is expected. Without it the copy_data method result in an error as below. So sure to grant read,write access on the data_pump_dir to the user schema.
begin
DBMS_CLOUD.COPY_DATA ('TEST','ATP_TABLES','https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test2.csv','ASANGA',format => json_object('type' VALUE 'CSV','skipheaders' value '1'));
end;
Error report -
ORA-20000: ORA-06564: object DATA_PUMP_DIR does not exist
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 1285
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 4290
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 4311
ORA-06512: at line 2
20000. 00000 -  "%s"
*Cause:    The stored procedure 'raise_application_error'
           was called which causes this error to be generated.
*Action:   Correct the problem as described in the error message or contact
           the application administrator or DBA for more information.
2. A table called test already exists in the ATP database with three columns.
desc test
Name Type         
---- ------------ 
A    NUMBER       
B    VARCHAR2(10) 
C    NUMBER
3. A csv file exists in the object storage bucket with the data that needs to be copied into the table. First few lines of this file is shown below.
"A","B","C"
998,"abc 998",20958
999,"abc 999",20979
1000,"abc 1000",21000
1,"abc 1",21
2,"abc 2",42
3,"abc 3",63
4. Execute copy data procedure
begin
DBMS_CLOUD.COPY_DATA (table_name => 'TEST',
                    credential_name => 'ATP_TABLES',
                    file_uri_list => 'https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test2.csv',
                    schema_name => 'ASANGA',
                    format => json_object('type' VALUE 'CSV','skipheaders' value '1'));
end;
/
5. Check the data is available in the table. If for whatever reason the copying of data fails then a log table is shown as below which could be looked up to find the reason for failure.
ORA-20003: Reject limit reached, query table "ASANGA"."COPY$11_LOG" for error details

Creating external table in ATP
1. Another way to load data is to create an external table. It is assumed that all necessary permissions and credentails are in place.
2. Use create_external_table procedure to create the extenral table. The external data file used for this external table resides in the object storage.
BEGIN  
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(   
      table_name =>'test_ex_table',   
      credential_name =>'ATP_TABLES',   
      file_uri_list => 'https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test2.csv',
      format => json_object('type' value 'csv', 'skipheaders' value '1'),   
      field_list => 'A,B,C',   
      column_list => 'A number,   
                     B varchar2(20),
                     C number');
   END;
/
3. This will create an external table in the ATP.
desc test_ex_table
Name Type         
---- ------------ 
A    NUMBER       
B    VARCHAR2(20) 
C    NUMBER
4. The external table could be validated with validate_external_table procedure.
exec DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (table_name =>'test_ex_table', 	schema_name => 'ASANGA');
If there are issues in the data file used for the external table the procedure would give an output similar to below. The validation_log table could be refered to find out the reason for failure.
Error starting at line : 53 in command -
BEGIN DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (table_name =>'test_ex_table', 	schema_name => 'ASANGA'); END;
Error report -
ORA-20003: Reject limit reached, query table "ASANGA"."VALIDATE$14_LOG" for error details
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 1259
5. The external table could be queried as any other
select * from test_ex_table;
Once all verification steps are run, the data from external table could be loaded to a ATP "internal" table with create as statement.
create table test as select * from test_ex_table;

Creating external table with dump file
1. Only difference between this method and previous method is the source of the data. In previous case it was a csv whereas here it will be a dump file. The dumpfile must be created using ORACLE_DATAPUMP access driver. In this example the following table's data is exported to dump file using the access driver as below.
SQL> desc DATETEST
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VERSION                                            VARCHAR2(20)
 INDATE                                             DATE

SQL> CREATE TABLE datetest_ex
    ORGANIZATION EXTERNAL
    (
      TYPE ORACLE_DATAPUMP
      DEFAULT DIRECTORY dumpdir
      LOCATION ('datetest.dmp')
    )
    AS SELECT * FROM datetest;
Above SQL will generate a dump file in the database directory dumpdir. Upload this to the object storage.

2. Create the external table using the dump file. For the dump files the type is specified as datapump.
BEGIN
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name =>'test_dmp_table',
    credential_name =>'ATP_TABLES',
    file_uri_list =>'https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/datetest.dmp',
    format => json_object('type' value 'datapump', 'rejectlimit' value '1'),
    column_list => 'VERSION VARCHAR2(20), INDATE date' );
END;
/
3. The validation of the external table could be done the same way as before with validate_external_table procedure.
4. Query and move data to internal table as needed.
select * from test_dmp_table;




Creating external partitioned table in ATP
Variant of the external table is the creation of external partition table. In this example the two csv files contains data for the table. Each file contains a specific range of values. The file test_1000.csv has data from 1 to 1000 in column A while file test_2000.csv has values from 1001 to 2000. Range partitions are created encapsulating this.
1. The create command for external partitioned table is shown below.
BEGIN  
   DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(
      table_name =>'test_ex_part_table',  
      credential_name =>'ATP_TABLES',
      format => json_object('type' value 'csv', 'skipheaders' value '1'), 
      column_list => 'A number, B varchar2(20), C number',
      partitioning_clause => 'partition by range (A)
								(
                                partition p1 values less than (1000) location (''https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test_1000.csv''),
                                partition p2 values less than (2000) location (''https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test_2000.csv'')
                                 )'
     );
   END;
/
2. Validate the external partitioned table with
exec DBMS_CLOUD.VALIDATE_EXTERNAL_PART_TABLE (table_name => 'test_ex_part_table',	schema_name => 'ASANGA');
Similar to external table, if validation encounters any errors an validation_log table will be shown.
3. Query the external paritioned table. As with normal partitioned tables, the partition pruning would happened depending on the predicates used.

Creating hybrid partitioned table in ATP
Hybrid partitioned table has data sources that are internal and external. For example few partitions could be created for external data files (similar to external tables) while another parition may store its data internally in the table itself. The external location data sources are specified with the key words "external location". Internal paritions do not have any reference to location. 1. Below hybrid partitioned table use the same external data file as previous case and has an additional internal parition for values between 2001-3000.
BEGIN  
   DBMS_CLOUD.CREATE_HYBRID_PART_TABLE(
      table_name =>'test_hyb_part_table',  
      credential_name =>'ATP_TABLES',
      format => json_object('type' value 'csv', 'skipheaders' value '1'), 
      column_list => 'A number, B varchar2(20), C number',
      partitioning_clause => 'partition by range (A)
                                (
                                partition p1 values less than (1000) external location (''https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test_1000.csv''),
                                 partition p2 values less than (2000) external location (''https://swiftobjectstorage.uk-london-1.oraclecloud.com/v1/#namespace#/tabledata/test_2000.csv''),
                                 partition p3 values less than (3000)
                                 )'
     );
   END;
/
2. Validation of the hybrid partitioned table could be done with
exec DBMS_CLOUD.VALIDATE_HYBRID_PART_TABLE (table_name => 'test_hyb_part_table',	schema_name => 'ASANGA');
3. Similar to partitioned table, based on predicate partition pruning will happen when queried.