Creating activities

You create activities to define repeatable data movement configurations. You use the activity creation wizard to specify the source that you want to move data from, the target that you want to move data to, policies that govern how the data is moved, the InfoSphere® Information Server engines that you want to use to move the data, and the InfoSphere DataStage® projects that you want to use in the activities. After you create an activity, users that have access to the activity can run the activity by using the activity run wizard.

Before you begin

You need to have defined source and target assets. You define the assets that you want to use in InfoSphere Data Click by importing them into the metadata repository by using InfoSphere Metadata Asset Manager.

If you want to use an InfoSphere DataStage project, other than the DataClick default project in your InfoSphere Data Click activities, then you must import an InfoSphere Data Click job template into the InfoSphere DataStage project. The job template for the default InfoSphere Data Click project is automatically imported during the installation process.

If you want to use a specific InfoSphere Information Server engine in an activity, you must create an InfoSphere DataStage project and specify the particular engine that you want to use. Then, import an InfoSphere Data Click job template into the InfoSphere DataStage project.

You must have the Data Click Author role to create activities and an administrator must grant you access to the InfoSphere DataStage project that you are going to use in the InfoSphere Data Click activity.

About this task

When you select the sources that you want to move data from and the targets that you want to move data to, you have the following options:
Table 1. Assets that you can move by using InfoSphere Data Click
Source Assets you can move Supported targets
Relational database
  • Database schemas
  • Tables in a schema,
  • Individual tables and columns
  • Relational databases
  • Hadoop Distributed File Systems (HDFS) in InfoSphere BigInsights
  • Amazon S3
Amazon S3
  • Buckets
  • Folders in a bucket
  • Individual objects in the folders
  • Relational databases
  • Hadoop Distributed File Systems (HDFS) in InfoSphere BigInsights

If the metadata that you specify in InfoSphere Data Click activities changes, you need to reimport the metadata by using InfoSphere Metadata Asset Manager. After you reimport the metadata, existing activities automatically use the updated metadata.

Users are assigned to activities at the project level. If you want to add additional users to an activity, use the DirectoryCommand tool to add users to the InfoSphere DataStage project that you specified on the Access panel in the activity creation wizard.

When you delete an activity, all activity runs that are associated with the activity are also deleted.

Procedure

  1. Open InfoSphere Data Click.
  2. Select New and then select Relational Database to BigInsights, Relational Database to Relational Database, Relational Database to Amazon S3, Amazon S3 to Relational Database, or Amazon S3 to InfoSphere BigInsights.
  3. In the Overview pane, enter a name and description for the activity.
  4. In the Sources pane, select the assets that you want to move data from.

    The assets that you select are the sources that InfoSphere Data Click users are limited to extracting data from when they run the activity. Then, click Next.

    Note: If you do not see the sources that you want to use in this activity, then click Import Additional Assets, which will launch InfoSphere Metadata Asset Manager.
  5. Select the data connection that you want to use to connect to Amazon S3 or the source database and enter the credentials to access the source. Click Next. You are prompted to enter credentials only if the password for the data connection was not saved when you imported metadata for the database by using InfoSphere Metadata Asset Manager or if the password expired or was changed.
  6. In the Target pane, select the database, Amazon S3 bucket, or folder in the distributed file system, such as a Hadoop Distributed File System (HDFS) in InfoSphere BigInsights, to move data to. Then, click Next.

    The target database or folder that you select is the asset that InfoSphere Data Click users are limited to writing data to when they run the activity.

    Note: If you do not see the target that you want to use in this activity, then click Import Additional Assets, which will launch InfoSphere Metadata Asset Manager.
  7. Select the data connections that you want to use to connect to the target database, Amazon S3, or HDFS and enter the credentials to access them. If the target is an HDFS, enter the credentials to access the Hive table. Click Next. You are prompted to enter credentials only if the password for the data connection was not saved when you imported metadata for the database, HDFS, or the Hive table by using InfoSphere Metadata Asset Manager, or if the password expired or was changed.
  8. In the Access pane, select the InfoSphere DataStage project that you want to use in the activity. If your instance of InfoSphere Data Click is configured to use more than one InfoSphere Information Server engine, select the engine that you want to use. The engine and the project that you select are used to process the InfoSphere DataStage job requests that are generated when you run the activity. The only projects that are displayed are:
    • Projects where you as the Data Click Author has the DataStage Developer role.
    • Projects that you imported as InfoSphere Data Click job templates.
    The only engines that are displayed are engines that are specified in projects that you have access to.

    Then click Next.

  9. In the Policies pane, specify the maximum number of records that you want to allow to be moved for each table and the maximum number of tables to extract.
    • If the target you selected is an HDFS, specify the delimiter that you want to use between column values in the Hive table.

      When the activity is run, a Hive table is automatically generated for each table that you select in the Source pane. The field delimeter that you specify separates columns in the Hive table. You can specify any single character as a delimiter, except for non-printable characters. In addition, you can specify the following control characters as a delimiter: \t, \b, and \f.

    • If the target that you selected is an Amazon S3 bucket, specify the storage class that defines how you want your data stored when it is moved to Amazon S3.
  10. Click Save to save the activity and close the wizard.

What to do next

To run the activity, select the activity in the Monitoring > Activities workspace and click Run. InfoSphere Data Click users who have access to the activity can run it. To add users to an activity, you must grant them the DataStage Operator role and add them to the InfoSphere DataStage project that you specified in Step 8.