InfoSphere Information Server integration scenarios

Information integration is a complex activity that affects every part of an organization. To address the most common integration business problems, these integration scenarios show how you can deploy and use IBM® InfoSphere® Information Server and the InfoSphere Foundation Tools components together in an integrated fashion. The integration scenarios focus on data quality within a data warehouse implementation.

Data integration challenges

Today, organizations face a wide range of information-related challenges: varied and often unknown data quality problems, disputes over the meaning and context of information, managing multiple complex transformations, leveraging existing integration processes rather than duplicating effort, ever-increasing quantities of data, shrinking processing windows, and the growing need for monitoring and security to ensure compliance with national and international law.

Organizations must streamline and connect information and systems across enterprise domains with an integrated information infrastructure. Disconnected information leaves IT organizations unable to respond rapidly to new information requests from business users and executives. With few tools or resources to track the information sprawl, it is also difficult for businesses to monitor data quality and consistently apply business rules. As a result, information remains scattered across the enterprise under a multitude of disorganized categories and incompatible descriptions.

Some key data integration issues include:

Enterprise application source metadata is not easily assembled in one place to understand what is actually available. The mix can also include legacy sources, which often do not make metadata available through a standard application programming interface (API), if at all.
Master reference data, names and addresses of suppliers and customers, part numbers and descriptions, differ across applications and duplicate sources of this data.
Hundreds of extract, transform, and load (ETL) jobs need to be written to move data from all the sources to the new target application.
Data transformations are required before loading the data so it will fit into the new environment structures.
The ability to handle large amounts of data that can be run through the process, and finish on time, is essential. Companies need the infrastructure to support the running of any of the transformation and data-matching routines on demand.
No consolidated view of data quality across the organization is available.

InfoSphere Information Server integration solution

InfoSphere Information Server and InfoSphere Foundation Tools components are specifically designed to help organizations address the data integration challenges and build a robust information architecture that leverages existing IT investments. The solution offers a proven approach to identifying vital information; specifying how, when, and where it should be made available; determining data management processes and governance practices; and aligning the use of information to match an organization's business strategy.

InfoSphere Foundation Tools components help your organization profile, model, define, monitor, and govern your information. By integrating the solutions provided by the InfoSphere Foundation Tools components, your organization can discover and design your information infrastructure and start building trusted information across the organization.

The IBM InfoSphere Information Server platform consists of multiple product modules that you can deploy together or individually within your enterprise integration framework, as shown in Figure 1. InfoSphere Information Server is designed to flexibly integrate with existing organizational data integration processes to address the continuous cycle of discovery, design, and governance in support of enterprise projects.

Figure 1. The InfoSphere Information Server platform supports your data integration processes.

Shows the IBM InfoSphere Information Server integrated modules

Figure 2 illustrates the components and the metadata they generate, consume, and share.

Typically, the process starts with defining data models. An organization can import information from IBM Industry Data Models (available in InfoSphere Data Architect), which includes a glossary, logical, and physical data model. The glossary models contains thousands of industry-standard terms that can be used to pre-populate InfoSphere Information Governance Catalog. Organizations can modify and extend the IBM Industry Data Models to match their particular business requirements.

Figure 2. InfoSphere Information Server product modules

Shows how an organization can leverage the IBM InfoSphere Information Server architecture to maximize application development activities via the unified metadata repository.

After the data models are defined and business context is applied, the analyst runs a data discovery process against the source systems that will be used to populate the new target data model. During the discovery process, the analyst can identify key relationships, transformation rules, and business objects that can enhance the data model, if these business objects were not previously defined by the IBM Industry Data Models.

From the discovered information, the analyst can expand the work to focus on data quality assessment and ensure that anomalies are documented, reference tables are created, and data quality rules are defined. The analyst can link data content to established glossary terms to ensure appropriate context and data lineage, deliver analytical results and inferred models to developers, and test and deploy the data quality rules. When the data quality rules are applied to data from source systems, exceptions to the rules can be tracked in IBM InfoSphere Data Quality Console.

The analyst is now ready to create the mapping specifications, which are input into the ETL jobs for the new application. Using the business context, discovered information, and data quality assessment results, the analyst defines the specific transformation rules necessary to convert the data sources into the correct format for the IBM Industry Data Model target. During this process, the analyst not only defines the specific business transformation rules, but also can define the direct relationship between the business terms and their representation in physical structures. These relationships can then be published to InfoSphere Information Governance Catalog for consumption and to enable better understanding of the asset relationships.

The business specification now serves as historical documentation as well as direct input into the generation of the IBM InfoSphere DataStage® ETL jobs. The defined business rules are directly included in the ETL job as either code or annotated to-do tasks for the developer to complete. When the InfoSphere DataStage job is ready, the developer can also decide to deploy the same batch process as an SOA component by using IBM InfoSphere Information Services Director.

Throughout this process, metadata is generated and maintained as a natural consequence of using each of the InfoSphere Information Server modules. The InfoSphere Information Server platform shares relevant metadata with each of the user-specific roles throughout the entire integration process. Because of this unique architecture, managing the metadata requires little manual maintenance. Only third-party metadata requires administration tasks such as defining the relationships to the InfoSphere Information Server metadata objects. Administrators and developers who need to view both InfoSphere Information Server and third-party metadata assets can use InfoSphere Information Governance Catalog to query, analyze, and report on this information from the common repository.