Data classification analysis

During a column analysis job, a data class is assigned to each column in your data. A data class categorizes a column according to how the data in the column is used. For example, if a column contains data such as 10/04/07, the Date class is assigned to the column because 10/04/07 is an expression for a date.

By classifying each data column, you gain a better understanding of the type of data in that column and how it is used. To ensure that your data is of good quality, you must assign accurate classes to your data. Data classes are also used by the system during domain analysis to match similar columns.

To assign a data class, the frequency distribution of a column is evaluated for characteristics such as cardinality (the number of distinct values in a column) and data type. A data type describes the structural format of data in a column. For example, columns that contain numeric data are type N (numeric), and columns that contain alphabetic data are type A (alphabetic). The frequency distribution results are used by the system to infer a class for the column. After the analysis completes, you review, accept, or reject the inferences.

One of eight data classes is inferred for each column:
Identifier
A data value that is used to reference a unique entity. For example, a column with the class of Identifier might be a primary key or contain unique information such as a customer number.
Indicator
A column that contains only two values. For example a column with the class of Indicator might contain data such as true and false or yes and no.
Code
A column that contains code values that represent a specific meaning. For example, a column with the class of Code might contain data about the area code in a telephone number.
Date
A column that includes chronological data. For example, a column with the class of Date might contain data such as 10/10/07.
Quantity
A column that contains data about the numerical value of something. For example, a column with the class of Quantity might contain data about the price of an object.
Large Object
A column that uses a BLOB data type. For example, a column with the class of Large Object might contain an array.
Text
A column that contains free-form alphanumeric data. For example, a column with the class of Text might contain data about the name of a company or person.