InfoSphere Discovery tasks

You can use InfoSphere® Discovery to identify and document what data you have, where it is located, and how it is linked across systems. You can create and populate data sets, discover data types, discover primary-foreign keys, and organize data into data objects within each data source.

Your organization can use InfoSphere Discovery to complete the following tasks:

Run column analysis
Column analysis automatically discovers many standard column statistics, such as cardinality, data type frequency, value frequency, and minimum and maximum values. You can drill down into statistics from column analysis to obtain different views of how your data is related. You run column analysis on each table within a data set.

You can also run overlap analysis on multiple data sources simultaneously. All columns are compared to all other columns for overlaps. The results are displayed in a graph and table format that you can use to drill down, view, sort, and filter the statistics. In addition to identifying overlapping and unique columns, you can manage the process of tagging attributes that you consider critical to your analysis. These critical data elements, or CDEs, are the specific attributes that you want to include in your new target schema if you are migrating data or consolidating data into a new application, metadata management hub, or data warehouse.

Discover and analyze primary-foreign keys
InfoSphere Discovery can discover column matches, which are relationships between data in two columns in different tables, within the same data set. This relationship can be strong or weak. You can set minimum hit rate statistics and other criteria that must be met for a column pair to be considered a match.

You can automatically discover matching keys between any two data sources, even if the key is a composite key that involves many columns. You can then prototype different matching keys to determine the best matching key across multiple data sources.

Organize data sets into data objects
InfoSphere Discovery uses the primary-foreign key relationships to group tables into entities composed of related tables. These related tables, or data objects, are logical clusters of all tables in a data set that have one or more columns that contain data that is related to the same business entity. These business objects can be entered into to IBM® Optim™ for archiving data, and for creating consistent sample data sets for test data management.

Data objects are also useful when you start comparing data across sources. Each source might have different data structures and formats. Focusing on each source at the business object level and creating consistent samples of data that can be compared across sources helps to split large data sets into smaller related groups of tables and map those groups across data sources.

Discover and analyze transformations and business rules
InfoSphere Discovery can map two existing systems together to facilitate data migration, consolidation, or integration. InfoSphere Discovery automatically discovers complex cross-source transformations and business rules between two structured data sets.

After discovering substrings, concatenations, cross-references, aggregations, and other transformations, InfoSphere Discovery identifies the specific data anomalies that do not meet the discovered rules. These capabilities help to develop ongoing audit and remediation processes to correct errors and inconsistencies.

Build unified schemas
InfoSphere Discovery includes a complete workbench for analyzing multiple data sources and prototyping the combination of those sources into a consolidated, unified target, such as a master data management (MDM) hub, application, or enterprise data warehouse.

InfoSphere Discovery helps build unified data table schemas by accounting for known critical data elements and proposing statistic-based matching and conflict resolution rules. Data analysts use these rules to determine which data to consolidate for data migration, MDM, or data warehousing.