Scenarios for data discovery

These scenarios show how organizations used InfoSphere® Discovery to examine and better understand their data.

Retail: Uncovering relationships

A major clothing retailer watched sales plateau steadily for two continuous quarters. The retailer needed a new marketing strategy, but was unsure where to begin their campaign. With their current data profiling tool, data analysts could find potential primary keys, but were forced to manually instruct the tool to find that same key in another table. Determining relationships between tables was time consuming, difficult, and not intuitive. Without a solution to discover relationships, data analysts could not make associations between massive amounts of sales data.

By using InfoSphere Discovery, data analysts analyzed the data values in all tables that contained customer data and automatically generated an entity relationship diagram. The diagram showed relationships between data, such as how customer age and demographic information related to specific clothing purchases. Data analysts reviewed existing primary keys and created new primary keys based on these new relationships.

Data analysts then used the primary-foreign key relationships to group tables into business entities comprised of related tables. These tables represented specific business objects, which are logical clusters of all tables in a data set that have one or more columns that contain data related to the same business entity. Using these smaller, targeted business objects, the clothing retailer focused on relationships in their sales data to help drive their marketing strategy.

Healthcare: Discovering sensitive data

A major hospital recently decided to convert all patient records from paper documents to digital records. Because the hospital has multiple branches, several databases were created to house the digital records. The resulting data was largely structured, but each database contained tables that were formatted inconsistently and contained empty cells. Sensitive data, such as social security number, blood type, and list of medications, was not labeled appropriately or was merged with other parts of the patient record. Without a unified solution, discovering and analyzing the digital records would require months of human involvement and manual manipulation of data.

The hospital used InfoSphere Discovery to discover statistics about the columns in each of the databases. These statistics were used to develop a detailed understanding of the structure and format of the patient records, which helped to normalize and standardize records. Using the built-in classification algorithms, data analysts identified patterns that matched data in each record, such as patient name, address, and date of birth. By using custom classifications, data analysts isolated sensitive data elements, and enforced enterprise-wide policies to protect these elements, masking them from unauthorized users.