InfoSphere® Data Click authors
and users can run activities that were created by InfoSphere Data Click authors.
You run activities to move data from a source asset, such as a database
or a bucket that is stored on Amazon S3, to a target database, bucket,
or distributed file system, such as a Hadoop Distributed File System
(HDFS) in InfoSphere BigInsights®.
About this task
You can run activities that appear in the list of activities
on the Home tab. You can review activity runs
in the Monitoring section on the Home tab.
The configuration of activities that are available to you is set by
the InfoSphere Data Click
author.
The activity run wizard prompts you for information
that is needed before you can run the activity. Each pane has a tab.
A check mark on a tab indicates that the information in the pane is
adequate and you do not need to provide additional information. If
no check mark exists, you must provide information in the pane. If
all tabs are marked with check marks, you can proceed to the Summary pane
and run the activity with as few as two clicks.
Procedure
- Open InfoSphere Data Click and
select an InfoSphere Data Click activity
in the pane on the left side of the Home tab.
- Click Run.
The InfoSphere Data Click wizard
is displayed showing the source databases and schemas that you can
move.
- In the Source pane, select the data
that you want to move. Then click Next.
- In the Target pane, select the target that you want to
move data to. Then click Next.
- In the Options pane, review the name
of the table that is created in the target database, the files that
are created in the target Amazon S3 bucket, or the Hive table name
that is automatically created when you run the activity. Expand the Advanced section
to change the name that is automatically assigned.
Option |
Description |
If you are moving data to a target HDFS |
The default Hive table names are assigned by using the following
format: <schema_name>.<source_table_name>.
For example, Jane.NorthEastRegion. The default
Hive schema name is the user name of the user that is running the
activity. You can update the following components:- Hive table schema
You can use the schema name of the source
table that you specified on the Source pane,
or you can specify a new name. For example, I can specify JaneSmith_Customers.
The Hive schema name would be JaneSmith_Customers,
and the Hive table name would be JaneSmith_Customers.NorthEastRegion.
- Hive table
You can add a prefix or suffix to the Hive table
name. - For example, if you specify 2014 as the table
prefix, the Hive table name is Jane_Customers_2014NorthEastRegion_JSmith.
- For example, if you specify critical as the
table suffix, the Hive table name is Jane_Customers.NorthEastRegion_critical.
|
If you are moving data to a target database |
The default table name is <prefix>_<source_table_name>_<suffix>. If
tables with the same name already exist in your target database, you
can append the new tables to the existing tables by selecting Append
to existing tables.
|
If you are moving data to a target Amazon S3 bucket |
The default table name is <prefix>_<source_table_name>.
Select Include source schema in file name if
you want the name of the schema that you specified as your source
in the name of the table that is created in the target Amazon S3 bucket. You
can add a custom prefix to the file name. If you add a prefix to
the file name, it affects where the file is stored.
You
can specify the file structure of the new file, how you want your
data delimited, and so on.
When you select Create
a file that documents its structure in the first row,
a file is created that documents the format of how your data is stored
in the source database. For example, if you have a source database
table that stores a customer name and a social security number with
the column names Customter_Name and SSN,
then the file will document the database's format in the first row
as Customer_Name:VarChar(20), SSN:VarChar(11).
The comma is the column delimiter.
|
Then click Next.
- Review the information in the Summary pane. You
see the name that was generated for the activity run and the policies
that apply to the activity run.
- Click Run. The activity
is submitted for processing. When the job is processed:
- Data is copied from the source that you selected and moved to
the target that you selected.
- The data that you are moving is registered in the metadata repository,
and is accessible in Information Governance Catalog and
other products in the InfoSphere Information Server suite.
A Hive table is also created for the source table for activities
that move data from a database to a Hadoop Distributed File System
(HDFS).
What to do next
After you run an activity, you can view the status of the
activity run in the
Monitoring section of
InfoSphere Data Click.
You can select an activity and click
Run Again to
generate and run a new instance of the activity.