You can use InfoSphere® Discovery to identify and document what data you have, where it is located, and how it is linked across systems. You can create and populate data sets, discover data types, discover primary-foreign keys, and organize data into data objects within each data source.
Your organization can use InfoSphere Discovery to complete the following tasks:
You can also run overlap analysis on multiple data sources simultaneously. All columns are compared to all other columns for overlaps. The results are displayed in a graph and table format that you can use to drill down, view, sort, and filter the statistics. In addition to identifying overlapping and unique columns, you can manage the process of tagging attributes that you consider critical to your analysis. These critical data elements, or CDEs, are the specific attributes that you want to include in your new target schema if you are migrating data or consolidating data into a new application, metadata management hub, or data warehouse.
You can automatically discover matching keys between any two data sources, even if the key is a composite key that involves many columns. You can then prototype different matching keys to determine the best matching key across multiple data sources.
Data objects are also useful when you start comparing data across sources. Each source might have different data structures and formats. Focusing on each source at the business object level and creating consistent samples of data that can be compared across sources helps to split large data sets into smaller related groups of tables and map those groups across data sources.
After discovering substrings, concatenations, cross-references, aggregations, and other transformations, InfoSphere Discovery identifies the specific data anomalies that do not meet the discovered rules. These capabilities help to develop ongoing audit and remediation processes to correct errors and inconsistencies.
InfoSphere Discovery helps build unified data table schemas by accounting for known critical data elements and proposing statistic-based matching and conflict resolution rules. Data analysts use these rules to determine which data to consolidate for data migration, MDM, or data warehousing.