Data classification in the InfoSphere Information Analyzer thin client

A data class is an asset that categorizes database columns and data file fields according to the type of the data and how the data is used. Data classification is the process of assigning a data class to a database column by InfoSphere® Information Analyzer during a column analysis job. Data classification can also be done manually in InfoSphere Information Governance Catalog.

Data classes in InfoSphere Information Server

The list below contains all existing data classes enabled for InfoSphere Information Server. Note: Metadata in the Data that's considering during classification column includes information such as the name of the column, inferred types, inferred formats, the number of formats, counts, and the number of distinct values. If only metadata is considered during the classification, then the actual column data is not taken into account.
Table 1. Data classes in InfoSphere Information Server
Data Class Description Type Data that is considered during classification
Account number A string representing an account number. The column name is analyzed to see if it matches the following regular expression: columnaccount|acc|accnumber|accnum|accno|accountnumber Regex Column data
Address Line 1 Address Line 1 of a multi-line address.

The address values are classified based on the following logic:
  1. Column names are searched to see if they have the substring 'addr' and end with '1/2/3' or 'one/two/three'.
  2. The data type is searched to verify that it is a string.
  3. The column length is searched to verify that it's between 5 and 100 characters.
  4. The cardinality is verified. It should be greater than 75%.
  5. If there are more than 5 values, the number of formats relative to the distinct values should be great than 30%.

If you want to find columns named Address or Addr that do not contain the suffixes 1, 2, or 3, create a new data class with the same code and a modified regular expression to support country-specific or suffix-specific expressions.
Java Column data and metadata
Address Line 2 Address Line 2 of a multi-line address. Java Column data and metadata
Address Line 3 Address Line 3 of a multi-line address. Java Column data and metadata
Airport Code A string representing the IATA airport code. Value list Column data
Alabama State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Alabama. Regex Column data
Alaska State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Alaska. Regex Column data
Alberta Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Alberta. Regex Column data
American Express Card

(sub-category of Credit Card)
A 16-18 character number that identifies an American Express credit card account. Java Column data
Arizona State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Arizona. Regex Column data
Arkansas State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Arkansas. Regex Column data
BIC A string representing a Business Identifier Code. Java Column data
Boolean Numeric or alpha code for boolean values. Either 0 or 1, or True or False, Yes or No. Value list Column data
British Columbia Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province British Columbia. Regex Column data
California Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state California. Regex Column data
Canada Post Code A system of postal codes that are used by Canada Post. Regex Column data
Canada Province Code Two-letter alphabetic codes that are used to identify Canada provinces and territories. Value list Column data
Canada Province Name The name of the Canada provinces and territories. Value list Column data
Canadian Social Insurance Number A social insurance number (SIN) is a number issued in Canada to administer various government programs. Java Column data
City A name of a place such as a city or town. Value list Column data
Code Code System-defined data values from a domain data value set, each of which has a specific meaning, for example, product status codes. Java Metadata
Colorado State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Colorado. Regex Column data
Colors A string representing the name of colors. Value list Column data
Commercial and Government Entity Code The CAGE code represented by a string of five characters. Java Column data and metadata
Computer Host Name Hostname is a label that is assigned to a device connected to a computer network and is used to identify the device. Java Column data
Connecticut State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Connecticut. Regex Column data
Country Code A standard code defined for most of the countries and dependent areas in the world. Value list Column data
Country Name Specifies the name of any country. Value list Column data
Credit Card Number A credit card number. Java Column data
Currency A number followed or following a currency symbol. The following currencies are supported:
AED,ARS, AUD, BRL, CAD,
CHF,CNY, CZK, DM, DKK, EGP, EUR,
GBP, HKD, KRW, HRK, HUF, IDR, INR,
JPY, MXN, MYR, NOK, NZD, PLN, RON,
RUB, SAR, SGD, SEK, TRY,UAH,USD,ZAR,
$,€, £
Java Column data
Current Procedural Terminology CPT medical code set. Java Column data and metadata
Customer number A string representing a customer number. Regex Column data and metadata
Date Data values which are specific date, time, or duration references, for example, a product order date. Java Column data
Date of Birth Data values which are dates, and represent a date of birth. Java Column data and metadata
Delaware State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Delaware. Regex Column data
Diners Club Card

(sub-category of Credit Card)
A 15-18 character number that identifies a Diners Club credit card account. Java Column data
Discover Card

(sub-category of Credit Card)
A 17-18 character number that identifies a Discover Card credit card account. Java Column data
Driver's License A string representing a driver's license. Regex Column data and metadata
DUNS Number A unique numeric identifier assigned by Dun & Bradstreet (D&B) to a business entity. Regex Column data and metadata
Email Address An email address identifies an email box to which email messages are delivered. Java Column data
Eye Color A string representing the eye color of an individual. Value list Column data and metadata
First Name

(sub-category of Person Name)
First name of an individual. Java Column data
Florida State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Florida. Regex Column data
Fortune 1000 company

(sub-category of Name of an Organization)
A string representing the name of a company from the Fortune 1000 list. Java Column data
French INSEE Number The INSEE code is a numerical indexing code used by the French National Institute for Statistics and Economic Studies (INSEE) to identify various entities, including communes, départements. Java Column data
Gender An alpha code setting for gender. Either M or F, or Male or Female. Value list Column data
Geographic Coordinates A string representing the longitude and latitude in degrees. Java Column data
Georgia State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Georgia. Regex Column data
Germany car registration number A string representing a registration number for a German car. Java Column data
Hair Color A string representing the hair color of an individual. Value list Column data and metadata
Hawaii State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Hawaii. Regex Column data
Honorific Salutation of a person added before the first name (name prefix). Value list Column data
IBAN A string representing an International Bank Account Number. Java Column data
ICD-10 The 10th revision of the International Statistical Classification of Diseases and Related Health Problems, a medical classification list by the World Health Organization (WHO). Java Column data
Idaho State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Idaho. Regex Column data
Identifier Non-intelligent data values that are typically unique, and are used to reference a specific entity, for example, a product number. Java Metadata
Illinois State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Illinois. Regex Column data
INCO Terms (International Commercial Terms) A 3-characters string representing INCO Terms. Value list Column data
Indiana State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Indiana. Regex Column data
Indicator Code values that have only two mutually-exclusive values in the domain set, for example, a product Make/Buy indicator, that are often called Flags. Java Metadata
Individual Taxpayer Identification Number A 9-digit tax processing number issued by the Internal Revenue Service (IRS) to individuals who are required to have a US taxpayer identification number but who do not have, and are not eligible to obtain a SSN. Regex Column data and metadata
International Securities Identification Number An International Securities Identification Number (ISIN) uniquely identifies a security. Java Column data
International Standard Book Number The International Standard Book Number (ISBN) is a 13-digit number assigned by standard book numbering agencies to control and facilitate activities within the publishing industry. Java Column data
International Standard Industrial Classification A string representing International Standard Industrial Classification of All Economic Activities. Java Column data and metadata
Internet Protocol Address An Internet Protocol address (IP address) is a numerical label assigned to each device (e.g., computer, printer) participating in a computer network that uses the Internet Protocol for communication. Regex Column data
Internet Protocol Version 6 Address Internet Protocol version 6 (IPv6) is the latest version of the Internet Protocol. Regex Column data
Iowa State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Iowa. Regex Column data
Ireland Eircode A system of postal codes that are used by An Post, Ireland's postal service. Regex Column data
ISO 3166-2 Code A string representing ISO 3166-2 code of a state or province of a country. Value code Column data
‪Italian Fiscal Code‬ The Italian fiscal code card, officially known as Italy's Codice Fiscale, is the tax code card in Italy. Regex Column data
Japan CB

(sub-category of Credit Card)
A 17-18 character number that identifies a Japanese Credit Bureau (JCB) credit card account. Java Column data
Kansas State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Kansas. Regex Column data
Kentucky State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Kentucky. Regex Column data
Latitude A decimal number or a string representing the latitude in degrees. Java Column data and metadata
Longitude A decimal number or a string representing the longitude in degrees. Java Column data and metadata
Louisiana State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Louisiana. Regex Column data
Maine State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Maine. Regex Column data
Manitoba Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Manitoba. Regex Column data
Marital/Civil Status A string representing the relationship status of an individual. Value list Column data
Maryland State Driver's License A string representing the driver's license in the US state Maryland. Regex Column data
Massachusetts State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Massachusetts. Regex Column data
Master Card

(sub-category of Credit Card)
A 17-18 character number that identifies a Master Card credit card account. Java Column data
Michigan State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Michigan. Regex Column data
Minnesota State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Minnesota. Regex Column data
Mississippi State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Mississippi. Regex Column data
Missouri State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Missouri. Regex Column data
Montana State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Montana. Regex Column data
Month‬ A string or integer value representing a month in a date. Java Column data
Name of an Organization A string representing the name of an organization. Java Column data
Name Suffix Name suffix of a person. Value list Column data
Nebraska State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Nebraska. Regex Column data
Nevada State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Nevada. Regex Column data
New Brunswick Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province New Brunswick. Regex Column data
New Foundland and Labrador Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province New Foundland and Labrador. Regex Column data
New Hampshire State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state New Hampshire. Regex Column data
New Jersey State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state New Jersey. Regex Column data
New Mexico State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state New Mexico. Regex Column data
New York State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state New York. Regex Column data
NoClassDetected‬ Do Not Delete this Data Class. This Data Class is used to display the count of the number of distinct data values that did not meet any of the other Data Classes which were enabled during the most recent Column Analysis for a given column. Data Not applicable
North Carolina State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state North Carolina. Regex Column data
North Dakota State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state North Dakota. Regex Column data
Nova Scotia Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Nova Scotia. Regex Column data
Ohio State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Ohio. Regex Column data
Oklahoma State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Oklahoma. Regex Column data
Ontario Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Ontario. Regex Column data
Oregon State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Oregon. Regex Column data
Passport Number‬ Passport number is the unique ID assigned to a travel document, usually issued by the government of a nation, that certifies the identity and nationality of its holder for the purpose of international travel. Regex Column data
Pennsylvania State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Pennsylvania. Regex Column data
Percentage‬ A number representing a percentage. Regex Column data
Person Name The name of an individual. Java Column data
Prince Edward Island Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Prince Edward Island. Regex Column data
Quantity‬ Numerical data values that could be used in a computation, for example, a product price. Java Metadata
Quebec Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Quebec. Regex Column data
Rhode Island State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Rhode Island. Regex Column data
‪Routing Transit Number‬ A 9-digit code, used in the United States, identifying financial institutions. Java Column data
Saskatchewan Province Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the Canadian province Saskatchewan. Regex Column data
South Carolina State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state South Carolina. Regex Column data
South Dakota State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state South Dakota. Regex Column data
Spanish Fiscal Identification Number‬ The NIF is the Spanish tax identification number. Regex Column data
State/Province name The name of a state or province of a country. Value list Column data
Temperature‬ A number representing a temperature. Java Column data
Tennessee State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Tennessee. Regex Column data
Texas State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Texas. Regex Column data
Text‬ Free-form alphanumeric data values from an unlimited domain set, for example, a product description. Java Metadata
UK National Insurance Number‬ It is a number used in the United Kingdom (UK) in the administration of the National Insurance or Social Security System. Regex Column data
UK Post Code A system of postal codes that are used by UK's Royal Mail. Regex Column data
UK Province Code Two-letter alphabetic codes that are used to identify UK provinces. Value list Column data
Uniform Resource Locator‬ A URL is one type of Uniform Resource Identifier (URI); the generic term for all types of names and addresses that refer to objects on the World Wide Web. Java Column data
Universal Product Code‬ The Universal Product Code (UPC) is a barcode symbology that is widely used in many countries, for tracking trade items in stores. Java Column data
United States Standard Industrial Classification A 4-digit number used to classify industries in the United States. Java Column data and metadata
US County The name of a US county. Value list Column data
US Employer Identification Number A 9-digit number to identify US employer, typically in nn-nnnnnnn format with dash (-) being optional. Issued by IRS. Regex Column data and metadata
US National Drug Code A 10-digit code to identify US National Drug Code (NDC) represented either in 4-4-3, 5-3-2 or 5-4-1, often without dashes. UPC-A bar code of NDC coded product package embeds its NDC code. Java Column data and metadata
US Phone Number‬ US Phone Number is a string of specific numbers that a telephone or cell phone user can dial to reach another telephone or mobile phone in the United States (US). Regex Column data
US Social Security Number‬ In the United States, a Social Security number (SSN) is a unique 9-digit number issued to US citizens, permanent residents, and temporary (working) residents. Regex Column data
US Social Security Number Last 4 The last four digits of a United States Social Security Number (SSN). Regex Column data and metadata
US State Capital Name

(sub-category of City)
Specifies the name of US states and territories capitals. Value list Column data
US State Code‬ Two-letter alphabetic codes used to identify US states and certain other associated areas. Value list Column data
US State Name Specifies the name of US states and territories. Value list Column data
US Zip‬ US ZIP codes are a system of postal codes used by the United States Postal Service (USPS) since 1963. Java Column data
Utah State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Utah. Regex Column data
Vehicle Identification Number A vehicle identification number (VIN), also called a chassis number, is a unique code, including a serial number, used by the automotive industry to identify individual motor vehicles. Java Column data
Vermont State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Vermont. Regex Column data
Virginia State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Virginia. Regex Column data
VISA Card

(sub-category of Credit Card)
A 17-18 character number that identifies a VISA credit card account. Java Column data
Washington DC Driver's License

(sub-category of Driver's License)
A string representing the driver's license in US Washington, DC. Regex Column data
Washington State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Washington. Regex Column data
West Virginia State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state West Virginia. Regex Column data
Wisconsin State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Wisconsin. Regex Column data
Wyoming State Driver's License

(sub-category of Driver's License)
A string representing the driver's license in the US state Wyoming. Regex Column data

You can enable additional data classes by following the steps listed here. Those additional data classes are experimental and are disabled by default.