Loading virtual MDM data with InfoSphere DataStage

Extract, transform, and load (ETL) operations are used to extract data from an InfoSphere® MDM database and load it to downstream systems, and to extract data from either a single source or multiple sources and load it to MDM. Certain MDM APIs and utilities are integrated with InfoSphere DataStage® to provide ETL operations for your customer data.

The following diagram shows the flow of virtual MDM project metadata from InfoSphere MDM to InfoSphere Information Server, and how to create InfoSphere DataStage jobs that use MDM APIs.

Hover and click the icons to learn more, or replay the animation.

Image shows the integration flow of metadata from MDM to Information
Server and the use of the data in DataStage. You must have InfoSphere MDM installed. Click here to view the full requirements. You must also have InfoSphere Information Server installed. When you install InfoSphere Information Server, specific MDM API functions are then available to support your ETL operations in InfoSphere DataStage. Click here to view the full requirements. After you install MDM and have your operational server configured and running, you must configure an MDM project in MDM Workbench or import an existing operational server project into MDM Workbench. After you have a virtual MDM project created, use the Export wizard in MDM Workbench to package your project metadata. The Export wizard saves your virtual MDM project metadata in a single XMI file. This XMI file is used to create the base model and project configuration (assets) for the InfoSphere DataStage jobs that use the MDM APIs. Use the Master Data Management bridge in InfoSphere Metadata Asset Manger to import your MDM assets to InfoSphere Information Server. Once imported, these assets are available for use in InfoSphere DataStage ETL jobs. InfoSphere DataStage is a data integration tool for designing, developing, and running jobs that move and transform data. The Designer Client is the application that you use to create jobs that use the MDM APIs. There are three ways in which you can use the APIs in your jobs. MDM APIs that enable you to search, read, and write data are integrated with the InfoSphere DataStage MDM Connector stage. When you create a job in InfoSphere DataStage that uses the MDM Connector stage, you can specify which MDM API (mode) you want. Use the master data extract sample in InfoSphere DataStage to extract data from an MDM database to external files. This process is often used to view incremental data changes. The sample contains four jobs that run a complete data extract and uses the MDM Connector stage in member get mode. Use this sample to configure the InfoSphere DataStage Java Integration stage to use the MDM MPXDATA utility. Use this stage to read your customer data extract files and prepare the data to be loaded to and used by MDM.

The MDM Connector stage is installed with InfoSphere DataStage. MDM samples can be downloaded from the IBM Samples and Assets site.

Before you begin your ETL operations, you must first export you virtual MDM project metadata to InfoSphere Information Server. The MDM Workbench export wizard is used to create the metadata file. Then, you can use the Master Data Management bridge in InfoSphere Metadata Asset Manager to import the metadata.

Tip: If you previously used Clover ETL graphs in your MDM implementation, see the migration topic to better understand what InfoSphere DataStage features you can use.


Last updated: 23 October 2014