Information Management IBM InfoSphere Master Data Management, Version 11.3

InfoSphere MDM and IBM InfoSphere Global Name Recognition integration

IBM® InfoSphere® Global Name Recognition (GNR) is a name-recognition and name-scoring technology that classifies, searches, analyzes, and compares global name data sets. It is ideal for managing data on individuals in a multicultural world. Integrating the GNR APIs with your MDM operational server gives you the technology to cast a wider net across candidates.

The integration of a virtual MDM implementation and IBM Global Name Recognition (GNR) is achieved by using the GNRMETA bucket generation function in your algorithm configuration.

The operational server uses the IBM NameWorks (GNM) analyze component to provide name variants that can be used by the operational server during candidate selection. The analyze() method transliterates and parses the name. The method further provides gender information, a culture classification, a list of variant name forms for the name (name parts), and a list of countries where the name is found (country of association information).

Prerequisites
  • Operational server - 11.3 or greater
  • InfoSphere MDM Workbench - 11.3 or greater
Restriction: The use of GNR is not supported on pLinux.
The GNR C++ libraries (analytics and NameDataObject) and the GNR data directory (with the nameworks.config file) are packaged with the operational server. The C++ libraries are in:
  • Microsoft Windows: MAD_INSTALL_HOME\bin
  • IBM AIX®, Linux, or Solaris: MAD_INSTALL_HOME/lib
The GNR data files are in MAD_INSTALL_HOME/conf/gnr-data.

The connection between the operational server and GNR is configured by using either the madconfig enable_gnr utility target or the Enable/Disable GNRMETA job in InfoSphere MDM Workbench.

The virtual MDM bucket generation function, GNRMETA, calls out to GNR for the name variants and corresponding percentages that are produced by the analyze() method. The percentages are the frequencies of a particular variant in comparison to other variants. The variants are then filtered by using a percentage threshold setting. Only those variants that are greater than or equal to that percent are used in bucketing. This threshold setting is configured by using the derivation argument percent=value setting.

GNRMETA is similar to using the EQMETA function with an equivalency string code (equistrcode) of NICKNAME. EQMETA, with NICKNAME, looks up the various nickname forms of a token and then passes it through the META function. With GNRMETA, the lookup is done with GNR instead of a NICKNAME table.

There are two data derivation arguments (dvdArgs) used with GNRMETA. The first is the phonetic function. The second is the percentage threshold value, which is specified as percent=value. The value must be an integer. For example, IDENTAPHONE, percent=10.

GNRMETA is a bucket generation function that can be used with the existing virtual MDM standardization, comparison, and phonetic functions. You must use either PXNM or BXNM bucketing functions.

The processing steps are as follows:
  1. Input consists of name data.
  2. The input is standardized by using a virtual MDM standardization function. The output is a comparison (cmpd) string.
  3. GNRMETA calls GNR for name variants of the output.
  4. The variant results are used to create buckets.
  5. The bucket data is then used to select candidates for comparison and matching.

The key benefit of this integration is to offer more results during candidate selection. For example, when you are using EQMETA with NICKNAME and the input name of "Omar" (nickname "Umar"), you get one bucket result of OMR. With GNRMETA, you get three buckets of AMR, OMR, and UMR.

There can be some affect on performance by implementing this feature, but it is minimal.

Prerequisites
  • Operational server - 11.3 or greater
  • InfoSphere MDM Workbench - 11.3 or greater


Last updated: 27 June 2014