Running activities

InfoSphere® Data Click authors and users can run activities that were created by InfoSphere Data Click authors. You run activities to move data from a source asset, such as a database or a bucket that is stored on Amazon S3, to a target database, bucket, or distributed file system, such as a Hadoop Distributed File System (HDFS) in InfoSphere BigInsights®.

Before you begin

You must have the operations database schema for the IBM® InfoSphere DataStage® and QualityStage® Operations Console installed and configured before you run activities. You must also start the AppWatcher process before you run the first activity. After you start the process once, the process should automatically start when you run all other activities.

You must have access to an activity that was created by an InfoSphere Data Click Author.

About this task

You can run activities that appear in the list of activities on the Home tab. You can review activity runs in the Monitoring section on the Home tab. The configuration of activities that are available to you is set by the InfoSphere Data Click author.

The activity run wizard prompts you for information that is needed before you can run the activity. Each pane has a tab. A check mark on a tab indicates that the information in the pane is adequate and you do not need to provide additional information. If no check mark exists, you must provide information in the pane. If all tabs are marked with check marks, you can proceed to the Summary pane and run the activity with as few as two clicks.

Procedure

  1. Open InfoSphere Data Click and select an InfoSphere Data Click activity in the pane on the left side of the Home tab.
  2. Click Run.

    The InfoSphere Data Click wizard is displayed showing the source databases and schemas that you can move.

  3. In the Source pane, select the data that you want to move. Then click Next.
  4. In the Target pane, select the target that you want to move data to. Then click Next.
  5. In the Options pane, review the name of the table that is created in the target database, the files that are created in the target Amazon S3 bucket, or the Hive table name that is automatically created when you run the activity. Expand the Advanced section to change the name that is automatically assigned.
    Option Description
    If you are moving data to a target HDFS The default Hive table names are assigned by using the following format: <schema_name>.<source_table_name>. For example, Jane.NorthEastRegion. The default Hive schema name is the user name of the user that is running the activity. You can update the following components:
    • Hive table schema

      You can use the schema name of the source table that you specified on the Source pane, or you can specify a new name. For example, I can specify JaneSmith_Customers. The Hive schema name would be JaneSmith_Customers, and the Hive table name would be JaneSmith_Customers.NorthEastRegion.

    • Hive table
      You can add a prefix or suffix to the Hive table name.
      • For example, if you specify 2014 as the table prefix, the Hive table name is Jane_Customers_2014NorthEastRegion_JSmith.
      • For example, if you specify critical as the table suffix, the Hive table name is Jane_Customers.NorthEastRegion_critical.
    If you are moving data to a target database The default table name is <prefix>_<source_table_name>_<suffix>.

    If tables with the same name already exist in your target database, you can append the new tables to the existing tables by selecting Append to existing tables.

    If you are moving data to a target Amazon S3 bucket The default table name is <prefix>_<source_table_name>. Select Include source schema in file name if you want the name of the schema that you specified as your source in the name of the table that is created in the target Amazon S3 bucket.

    You can add a custom prefix to the file name. If you add a prefix to the file name, it affects where the file is stored.

    You can specify the file structure of the new file, how you want your data delimited, and so on.

    When you select Create a file that documents its structure in the first row, a file is created that documents the format of how your data is stored in the source database. For example, if you have a source database table that stores a customer name and a social security number with the column names Customter_Name and SSN, then the file will document the database's format in the first row as Customer_Name:VarChar(20), SSN:VarChar(11). The comma is the column delimiter.

    Then click Next.
  6. Review the information in the Summary pane. You see the name that was generated for the activity run and the policies that apply to the activity run.
  7. Click Run. The activity is submitted for processing. When the job is processed:
    • Data is copied from the source that you selected and moved to the target that you selected.
    • The data that you are moving is registered in the metadata repository, and is accessible in Information Governance Catalog and other products in the InfoSphere Information Server suite.
    A Hive table is also created for the source table for activities that move data from a database to a Hadoop Distributed File System (HDFS).

What to do next

After you run an activity, you can view the status of the activity run in the Monitoring section of InfoSphere Data Click. You can select an activity and click Run Again to generate and run a new instance of the activity.