You create activities to define repeatable data movement
configurations. You use the activity creation wizard to specify the
source that you want to move data from, the target that you want to
move data to, policies that govern how the data is moved, the InfoSphere® Information Server engines
that you want to use to move the data, and the InfoSphere DataStage® projects that you want to use
in the activities. After you create an activity, users that have access
to the activity can run the activity by using the activity run wizard.
Before you begin
You need to have defined source and target assets. You
define the assets that you want to use in InfoSphere Data Click by
importing them into the metadata repository by using InfoSphere Metadata Asset Manager.
If
you want to use an InfoSphere DataStage project,
other than the DataClick default project in your InfoSphere Data Click activities,
then you must import an InfoSphere Data Click job
template into the InfoSphere DataStage project.
The job template for the default InfoSphere Data Click project
is automatically imported during the installation process.
If
you want to use a specific InfoSphere Information Server engine
in an activity, you must create an InfoSphere DataStage project
and specify the particular engine that you want to use. Then, import
an InfoSphere Data Click job
template into the InfoSphere DataStage project.
You
must have the Data Click Author role to create activities and an administrator
must grant you access to the InfoSphere DataStage project
that you are going to use in the InfoSphere Data Click activity.
About this task
When you select the sources that you want to move data
from and the targets that you want to move data to, you have the following
options:
Table 1. Assets that you can move by using InfoSphere Data ClickSource |
Assets you can move |
Supported targets |
Relational database |
- Database schemas
- Tables in a schema,
- Individual tables and columns
|
- Relational databases
- Hadoop Distributed File Systems (HDFS) in InfoSphere BigInsights
- Amazon S3
|
Amazon S3 |
- Buckets
- Folders in a bucket
- Individual objects in the folders
|
- Relational databases
- Hadoop Distributed File Systems (HDFS) in InfoSphere BigInsights
|
If the metadata that you specify in InfoSphere Data Click activities
changes, you need to reimport the metadata by using InfoSphere Metadata Asset Manager.
After you reimport the metadata, existing activities automatically
use the updated metadata.
Users are assigned to activities
at the project level. If you want to add additional users to an activity,
use the DirectoryCommand tool to add users to the InfoSphere DataStage project
that you specified on the Access panel in the
activity creation wizard.
When you delete an activity, all activity
runs that are associated with the activity are also deleted.
Procedure
- Open InfoSphere Data Click.
- Select New and then select Relational
Database to BigInsights, Relational Database
to Relational Database, Relational Database
to Amazon S3, Amazon S3 to Relational Database,
or Amazon S3 to InfoSphere BigInsights.
- In the Overview pane, enter a name
and description for the activity.
- In the Sources pane, select the assets
that you want to move data from.
The assets that you
select are the sources that InfoSphere Data Click users
are limited to extracting data from when they run the activity. Then,
click Next.
Note: If you do not see the
sources that you want to use in this activity, then click Import
Additional Assets, which will launch InfoSphere Metadata Asset Manager.
- Select the data connection that you want to use to connect
to Amazon S3 or the source database and enter the credentials to access
the source. Click Next. You are
prompted to enter credentials only if the password for the data connection
was not saved when you imported metadata for the database by using InfoSphere Metadata Asset Manager or
if the password expired or was changed.
- In the Target pane, select the database, Amazon S3 bucket,
or folder in the distributed file system, such as a Hadoop Distributed
File System (HDFS) in InfoSphere BigInsights, to move data
to. Then, click Next.
The target
database or folder that you select is the asset that InfoSphere Data Click users
are limited to writing data to when they run the activity.
Note: If
you do not see the target that you want to use in this activity, then
click Import Additional Assets, which will
launch InfoSphere Metadata Asset Manager.
- Select the data connections that you want to use to connect
to the target database, Amazon S3, or HDFS and enter the credentials
to access them. If the target is an HDFS, enter the credentials to
access the Hive table. Click Next. You are
prompted to enter credentials only if the password for the data connection
was not saved when you imported metadata for the database, HDFS, or
the Hive table by using InfoSphere Metadata Asset Manager,
or if the password expired or was changed.
- In the Access pane, select the InfoSphere DataStage project
that you want to use in the activity. If your instance of InfoSphere Data Click is
configured to use more than one InfoSphere Information Server engine,
select the engine that you want to use. The engine and
the project that you select are used to process the InfoSphere DataStage job
requests that are generated when you run the activity. The only projects
that are displayed are:
- Projects where you as the Data Click Author has the DataStage Developer role.
- Projects that you imported as InfoSphere Data
Click job templates.
The only engines that are displayed are engines that are specified
in projects that you have access to.Then click Next.
- In the Policies pane, specify the maximum number of records
that you want to allow to be moved for each table and the maximum
number of tables to extract.
- If the target you selected is an HDFS, specify the delimiter that
you want to use between column values in the Hive table.
When the
activity is run, a Hive table is automatically generated for each
table that you select in the Source pane. The field delimeter that
you specify separates columns in the Hive table. You can specify any
single character as a delimiter, except for non-printable characters.
In addition, you can specify the following control characters as a
delimiter: \t, \b, and \f.
- If the target that you selected is an Amazon S3 bucket, specify
the storage class that defines how you want your data stored when
it is moved to Amazon S3.
- Click Save to save the activity
and close the wizard.
What to do next
To run the activity, select the activity in the and click
Run.
InfoSphere Data Click users
who have access to the activity can run it. To add users to an activity,
you must grant them the DataStage Operator
role and add them to the
InfoSphere DataStage project
that you specified in Step
8.