Overview of InfoSphere Data Click

IBM® InfoSphere® Data Click provides self-service data integration so that any business or technical user can integrate data between various systems both on and off-premise.

InfoSphere Data Click simplifies data integration for users across organizations. Analysts, data scientists, and line-of-business users can retrieve data and populate new systems on demand. For example, an analyst can optimize a business intelligence environment by integrating warehouse data, or line-of-business users can deliver their data for analysis.

Whether the data source includes one table in a database or thousands of objects in a Hadoop cluster, you can get the data that you need with just a few clicks by using InfoSphere Data Click.

Define data integration from source to target

You can create activities to integrate data by using the InfoSphere Data Click browser-based interface. You can create multiple activities that each specify different source-to-target configurations.

When you create an activity, you define the source for the data. You can choose data that you require from a wide variety of data sources, including IBM PureData® (DB2®) and Oracle. You can limit the source to the data that you require, whether it is a single table, multiple databases, or multiple objects that are stored in an Amazon S3 bucket.

When you run an activity that moves data into a Hadoop Distributed File System (HDFS) in IBM InfoSphere BigInsights®, InfoSphere Data Click automatically creates Hive tables and stores them in the target directory. A Hive table is created for each table that you select, and you can specify the location of the target directory where the data is stored. The data types of the columns in the Hive table are assigned based on metadata information about the data types of the columns in the source. You can then use IBM Big SQL in InfoSphere BigInsights to read and analyze the data in the tables.

You also set the policies for the activity, including the amount of data that you can integrate when the activity runs. The policy choices that you make are applied automatically to any runs of that activity.

The following figure shows an example of the New Activity page on which you can specify the source for the data. In this example, the user selected one table in a database as the source. The user expanded the list of assets to view the columns in the table. To view this page, the user first entered some basic information on the Overview page, including a name for the activity. Next, the user will select the target location.

Figure 1. New Activity page in the InfoSphere Data Click interface
The figure shows the New Activity page that is described in the text.

Integrate data on demand

You can view the activities that you create on the InfoSphere Data Click home page with all the other activities that you can run. You can select a previously created activity and run it immediately, or you can review the activity and further customize it. For example, in an activity that integrates data from multiple tables, you can select a subset of the tables or a single table. You can also edit the default names of the target schema and tables.

The following figure shows an example of the page that is displayed when you select an activity and click Run. In this example, the user expanded the table in the available source database to review the table columns. The user can click Finish to run the activity without changes, or the user can remove some columns or click Next to review the target and the options.

Figure 2. Run Activity page in the InfoSphere Data Click interface
The figure shows the Run Activity page that is described in the text.

Monitor data integration activities

You can monitor data integration activities by using InfoSphere Data Click. You can review who ran an activity, the status of the run, the number of rows that were moved by an activity, the date that the activity was submitted, and the date that the activity completed. By reviewing this information, you can determine whether an activity ran successfully and whether the expected number of rows were processed.

The following figure, shows an example of the InfoSphere Data Click home page, including the Activities section and the Monitoring section. The headings for these two sections are highlighted.

Figure 3. Activities and Monitoring areas on the home page of InfoSphere Data Click
The figure shows the InfoSphere Data Click home page that is described in the text.

You can also check the status of activities on the IBM InfoSphere DataStage® and QualityStage® Operations Console by using the View Details option from the InfoSphere Data Click home page. The Operations Console provides a more detailed view of the status of jobs that are generated when you run activities.

Search, browse, and move assets from the Information Governance Catalog

You can also use Information Governance Catalog to identify the data that you want to make available in a target system. Information Governance Catalog is an interactive tool that you can use to search, browse, and query catalog assets from all your data sources. You can assign labels, stewards, and custom attributes to these assets. Assets can be grouped in a collection. You can then integrate the data directly into a target database, Amazon S3 bucket, or InfoSphere BigInsights by opening InfoSphere Data Click from Information Governance Catalog within the context of the assets that you identified. You can then use InfoSphere Data Click to move the assets.

Control security

You can define and control which users integrate data by authorizing some users to create and run activities and authorizing other users only to run prebuilt activities. For example, you might authorize technical users to create activities for business users. So an information architect can define the source and targets for warehouse data in a business intelligence environment and authorize an analyst to integrate the data. An enterprise architect can identify the source and target of the data from line-of-business users and authorize those users to deliver their data.

The InfoSphere Data Click users who create activities define the data source and target, and set the policies. By setting policies, the users who create activities can define the data flows and manage the stress on the system.

Powered by your ETL engine

InfoSphere Data Click uses the highly scalable data integration engine of IBM InfoSphere DataStage to process the data integration activities and to manage the metadata that is associated with the activities.