Features
Cross source column overlap analysis – performs a cross compare of all the columns across many data sources in order to establish a baseline of overlapping data across multiple sources.
Matching key prototype – Hypothesize and test the quality of matching keys on multiple data sources simultaneously
Empty target modeling and prototype – Drag, drop and combine attributes from data sources to prototype a new unified schema. View the profiling statistics for the prototype target data
Precedence Discovery – Automatic generation of attribute matching precedence based on statistical analysis.
Transformation Rule Discovery – IBM InfoSphere Discovery features patented algorithms that automate the discovery of complex business rules between two structured data sets:
substrings, concatenations, cross-references, aggregations, case statements, arithmetic equations etc.Automatic matching key discovery – Algorithms automatically discover the matching key and statistically validates the key between two data sources
Cross source data preview – Provides side by side preview of data across multiple data sources for the same logical row and allow the analyst to see values that match the business rules and anomalies that do not match
Identification of sensitive information – Workflow supports classification of Personally Identifiable Information (PII)
- Fuzzy value matching for sensitive data classification offers flexibility in identifying sensitive elements, eliminating the need to identify an exact match.
Business object creation – Define complete business objects (logical groupings of related objects e.g. customer) that serve as essential inputs into information-centric projects such as data integration, master data management, data warehousing test data management and data archiving using IBM Optim products
- Archiving volume projection with IBM InfoSphere Optim helps you predict the savings achievable by archiving a data object using a filter.
Project volume reduction for archiving strategies – After establishing referential integrity and building business objects you can assess how much you may save prior to archiving any data.
Sandbox for prototyping data object analytics – Provides a workflow which is independent from project tables for performing: volume analysis, test data analysis, and data object extract prototyping.
Evaluate filters for generating representative test data – Utilize test points consisting of expressions and other criteria to evaluate the quality of your test filter.
Custom algorithm support – Custom algorithm creation wizard helps you create, test, and deploy custom algorithms, which can be deployed to one project or shared with other projects.
- Selective classification and algorithm level thresholding provides more control over algorithm selection, eliminating the need to run all algorithms
Import/Export – Read mapping specs from CSV and generate source maps to CSV
Standardize business terms – Create and manage your business vocabulary within InfoSphere Business Glossary
Benefits
Speeds time-to-value of information centric projects by automating the data relationship discovery process.
Improves success rates of data dependant projects by providing a 360 degree view of data assets and their complex relationships across heterogeneous sources.
Increases Collaboration – Business objects can easily be discovered, defined and shared with IBM Optim products allowing re-use and faster deployment.
- Integration with IBM InfoSphere Information Server and its Metadata Asset Manager provides direct access from within InfoSphere Discovery. This integration capability helps you import connection data and metadata from Xmeta, and generate data models into Xmeta from within InfoSphere Discovery.
Reduces development time – IT can prototype and test new transformation rules for completeness before data is physically converted and moved. These rules can then be transferred to IBM InfoSphere FastTrack where business data analysts can augment with additional documentation and business logic before generating IBM InfoSphere DataStage Extract Transform and Load (ETL) jobs
Enables data governance by providing a centralized, accurate understanding of data relationships across complex heterogeneous data sources.

