Data classification
A data class is an asset that categorizes database columns and data file fields according to the type of the data and how the data is used. Data classification is the process of assigning a data class to a database column by IBM® InfoSphere® Information Analyzer during a column analysis job. Data classification can also be done manually by IBM InfoSphere Information Governance Catalog.
Predefined data classes are automatically installed with IBM InfoSphere Information Analyzer. These data classes can be displayed and used by IBM InfoSphere Information Governance Catalog. In addition to predefined data classes, you can create data classes in InfoSphere Information Analyzer and in InfoSphere Information Governance Catalog.
Use data classes to organize database columns and data file fields for review and subsequent column analysis work. For example, database columns with numeric data typically include numbers within a range of valid values.
Action | InfoSphere Information Analyzer | InfoSphere Information Governance Catalog |
---|---|---|
Create a data class | No | Yes, but not of type JAVA |
Edit a data class | No | Yes |
Delete a data class | No | Yes |
View and query a data class | No | Yes |
Classify an asset by using a data class | Yes, for database columns | Yes, for database columns and data file fields |
Assign data class to collection | No | Yes |
Assign terms, information
governance rules, stewards, custom attributes, and labels to a data class |
No | Yes |
Set data classification on an asset | Yes, for database columns | Yes, for database columns and data file fields |
View data classification of an asset | Yes | Yes |
Remove data classification from an asset | Yes, for selected data classifications | Yes |
Analyze an asset according to its data classification | Yes, if the data classification is enabled | No |
- Column Name Match
- The column name filter for data classes. A column is analyzed against the data class only if the name of the column matches the filter.
- Confidence
- A value 1 - 100 that is the measure of the overall quality in a data source and whether it met expectations. This property is determined by InfoSphere Information Analyzer and cannot be changed in InfoSphere Information Governance Catalog.
- Data type
- Only data that matches the data type is used in analysis by InfoSphere Information Analyzer.
- Detected
- Found by InfoSphere Information Analyzer during column analysis.
- Enabled
- The data classification is used when you run a column analysis job in InfoSphere Information Analyzer.
- Example
- Text that is an example of a match.
- Maximum data length
- The maximum character count of a value. The maximum data length must be equal to or greater than the minimum data length.
- Minimum data length
- The minimum character count of a value.
- Selected
- Reviewed and approved for use in column analysis by InfoSphere Information Analyzer.
- State
- This property is determined by InfoSphere Information Analyzer and cannot be changed in InfoSphere Information Governance Catalog.
- Threshold
- The percentage of data that must match the properties of the data class. The percentage is an integer value.
- Valid Value Reference File
- A file that contains a list of valid values. It must be referenced by a valid URL, for example http://www.ibm.com:80/my/path/to/mydataclass.txt, or file:///my/path/to/mydataclass.txt. The file must be available to all IBM InfoSphere Information Server engine tiers, and placed in the same location on each tier.