Column analysis overview

Column analysis is the component of IBM® InfoSphere® Information Analyzer used to assess individual columns of data. You control the scope of data subjected to column analysis at one time by selecting the database, tables, and columns to be analyzed.

The system initiates the process by accessing the data source based on the user-selected data and constructing a frequency distribution for each column. The frequency distribution contains an entry for each distinct data value in a column.

The system then analyzes the distinct data values in each frequency distribution to develop some general observations about each column.

The remainder of the column analysis process is driven by user review of the column analysis system data. That process consists of any or all of three parts:

Data classification analysis: Data classification analysis allows you to segregate and organize columns categorically. Such organization can facilitate further review by focusing on core considerations (for example, numeric columns typically fall into a particular valid range).
Column properties analysis: Column properties analysis allows you to assess the data contents against the defined metadata, validating the integrity of the metadata for use in other systems or identifying columns that are unused or are poorly defined.
Data quality controls analysis: Data quality controls analysis allows you to assess the data contents for basic, atomic conditions of integrity such as completeness and validity. These are fundamental assessments of data quality, providing the foundation for assertions of trust or confidence in the data.