Loading data from a file on Amazon S3

You can load data to your cloud database from a data file on Amazon S3 that you previously uploaded there. The data file cannot be larger than 80% of the free space in your home directory.

Follow these steps to load data to your cloud database from a local data file:

Creating a data file

To be able to load data, you must save the data in a delimited file, such as a comma-separated (CSV) file.

To create a delimited format file, use a database utility to extract data from your data store or DBMS to a file. Each data file must contain data from one table only.

Tip: If you are creating multiple data files, use the format schema.table for the file names if you want to automatically map files to target tables in your database. For example, if you create a file named FRUIT.GRAPES.gz, then, later during the load process, the file will be automatically mapped to the GRAPES table in the FRUIT schema of your database.
DBMS Details Learn More
Oracle Unload the data from Oracle by using by using a utility such as Oracle Application Express®. Oracle documentation about the unload process: Using Oracle Application Express Utilities
MySQL Export the data from MySQL by using a utility such as MySQL Workbench or phpMyAdmin. MySQL documentation about the export process: Data export
DB2® Export the data from Db2 by using a command-line utility. Db2 documentation about the export process: Exporting data

Moving a file to Amazon S3

To move a data file to Amazon S3, use the S3 browser (available at http://s3browser.com/download.php and https://s3browser.com/buypro.php). You must create a container (bucket) in the S3 browser to contain the file. You can move multiple files or a folder of files from your file system to Amazon S3. If you have a large amount of data, Amazon S3 offers the option of shipping the data and allowing Amazon to load the data to Amazon S3.

Learn more:

Amazon S3 documentation: Get Started With Amazon Simple Storage Service

Using DataStage with Amazon S3

You can also use IBM® InfoSphere® DataStage® to extract data from a database and move it to a platform such as Amazon S3. The DataStage Designer client has a palette that contains the tools that form the basic building blocks of a data management job.

DataStage parallel process

With DataStage, you can design and run jobs that include a source data object and a target data object. The source data object is associated with a database and specifies the table name and metadata to extract. The target data object is associated with Amazon S3 and specifies the name and location of the data file that will contain the target data.

Learn more:

DataStage documentation: Designing DataStage and QualityStage® jobs

Creating a target table

Before you load data into your database, you must create a target table for your data. The target table definition must match the structure and data types of the data in the data file.

You can create a target table using one of the following methods:
  • Let Db2® generate the SQL DDL statement to create the table, based on a delimited text file containing your data.
  • Write your own SQL DDL statement to create the table. If you don't use a specific application to create DDL statements, you can use IBM InfoSphere Data Architect. See Generating DDL scripts in the InfoSphere Data Architect Knowledge Center for more information.

Loading data from Amazon S3

To load data from Amazon S3, select one of the following methods:
  • From the web console. Load > Amazon S3. To load data from Amazon S3 using the web console, select Amazon S3 as the source. If you are loading segmented files, select the associated manifest file when you select the files to load.
  • External Tables directly. The following is an example SQL statement:
    INSERT INTO <table-name> SELECT * FROM EXTERNAL '<mys3file.txt>' USING
      (CCSID 1208 s3('s3.amazonaws.com', 
      '<S3-access-key-ID>',
      '<S3-secret-access-key>', 
      '<my_bucket>'
         )
      )
  • For improved performance, the Db2 LOAD command can also be used to load data from Amazon S3 using the following example command:
    CALL SYSPROC.ADMIN_CMD('LOAD FROM "S3::<amazon-s3-URL>::<s3-access-key-id>::<s3-secret-access-key>:
    :<s3-bucket-name>::<path-to-data-file>" OF <filetype> <additional-load-options> INTO <table-name>)
    The following is an example usage of the Db2 LOAD command:
    CALL SYSPROC.ADMIN_CMD('load from "S3::s3-us-west-2.amazonaws.com::<s3-access-key-id>:
    :<s3-secret-access-key>::ibm-state-store::bdidata2TB/web_site.dat" of DEL modified by codepage=1208 
    coldel0x7c WARNINGCOUNT 1000 MESSAGES ON SERVER INSERT into BDINSIGHTS2.web_site ');

    For supported command options. see: LOAD command.