Classifying an asset according to its data

Use data classification to identify database columns and data file fields according to their type.

Before you begin

  • To create, edit, delete, and assign data classes to assets, you must have the Information Governance Catalog Information Asset Author role or higher.
  • To browse and to query data classes, you must have the Information Governance Catalog User role or higher.

About this task

IBM® InfoSphere® Information Analyzer analyzes data and detects its data classification. In InfoSphere Information Analyzer and IBM InfoSphere Information Governance Catalog, you can manually assign a data classification to an asset. Only one data class can be assigned to an asset.

When you edit a data class, you can assign labels, stewards, terms, and information governance rules. In addition, you can define other attributes of the data class and classify assets according to the data class.

Detected classifications from InfoSphere Information Analyzer can be viewed, but not removed, by InfoSphere Information Governance Catalog.

A classification that was selected in either InfoSphere Information Analyzer or InfoSphere Information Governance Catalog, can be cleared and removed from InfoSphere Information Governance Catalog.

Procedure

  1. Click Catalog.
  2. Click Information Assets. Do any of the following actions.
    Task Steps
    Create a data class
    1. In the Manage pane, select Create Data Class.
    2. Type in a unique name and select a data class type. Optionally, specify parent data class, and short description.
    3. Click Save and Edit Details to open the new data class and define its properties.
    Edit a data class or classify an asset according to a data class
    1. In the Browse Hierarchies pane, click Data Classes.
    2. In the Browse Data Classes window, select the data class that you want to edit, and then select Edit from the menu.
    3. To assign the data class to assets, do these steps:
      1. Expand Classifies Assets.
      2. Select the asset type, and then type the asset name in the empty field. Alternatively, click the Down arrow key to get a list of all assets of that asset type.
    4. To specify the type of the data class and define its properties, do these steps:
      1. In the Type list, change the type of the data class, and then click Save.

        If you change the data class type, all values of the previous type are deleted from any asset that is classified by the data class.

      2. Expand Definition. The fields in the Definition pane vary according to the data class type.
        • If the data class type is Valid Value, type a single value in the Valid Values field. To add a value, click the Plus icon at the end of the field. You can add up to 250 values. To define a data class with more valid values, specify a file with valid values in the Valid Value Reference File field. If you define values in both Valid Values and Valid Value Reference File fields, values from both fields are merged.
        • If the data class type is Regex, type a single expression in the Regular Expression field. The expression is not validated for correctness.
        • To specify the percentage of data that matches the data class, type an integer value in the Threshold field.
        • To specify the minimum and maximum character count of the asset value, type an integer value in the Minimum and Maximum Data Length fields.
    5. To enable extended data classification during database column analysis in InfoSphere Information Analyzer, do these steps:
      1. Expand General Information.
      2. In the Enabled field, click True.
    6. Click Save.
    Delete a data class from the catalog
    1. In the Browse Hierarchies pane, select Data Classes.
    2. In the Browse Data Classes window, select the data class that you want to delete.
    3. Select Delete from the menu. The data class and its child data classes are deleted.
    Remove an unselected data classification that was made in InfoSphere Information Governance Catalog from an asset, or mark a data classification as not selected
    1. In the Browse Hierarchies pane, click Data Classes.
    2. In the Browse Data Classes window, select the data class that you want to remove from an asset, and then select Edit from the menu.
    3. Expand Data Classification.
    4. To remove a data classification from an asset, or to remove the Selected status of a data classification, click Deselect.
    5. Click Save.
  3. To search for data classes, click Search, and then click the Options link. Search for the data class or for assets that have data classes that are assigned to them.
  4. To query for data classes or for data classifications, click Queries.

    You can use the published query Database Columns and their Data Classifications to get information about all data classifications for database columns. You can also use the published query Data Classes and their Classifications to get information about all data classes.

Example: Classifying a database column

Suppose that you have a database column that you want to classify as a column that contains national identity numbers. In InfoSphere Information Governance Catalog, you create a data class that is called Nat_ID, and then set the following parameters of the data class:
  • Data class type is regex
  • Data type is string
  • Threshold is 90
  • Minimum data length is 9
  • Maximum data length is 9
  • Regular expression is ^[0-9]{9}$

You run column analysis in InfoSphere Information Analyzer. If 90% of the values in the database column match the format correctly, you can classify the database column as being a column of national identity numbers.