API and Migration Changes for IBM InfoSphere Global Name Recognition, Version 4.1

Product documentation


Abstract

This document covers changes to NameHunter and IBM NameWorks APIs that are associated with using IBM Global Name Recognition Version 4.1 products. The goal of this document is to provide concise information that is required to upgrade solutions and applications based on GNR product releases prior to Version 4.1.

Content




NameHunter API Changes: Version 3.2 to Version 4.1

Key Summary Points

  • The new Name object replaces string values as a way to represent a name in various API calls. Name objects are required to exchange name data with NameHunter when searching and comparing Organization names.
  • NameHunter APIs from GNR Version 3.2 have been retained for backward compatibility.
  • Additional functionality is provided for set() and get() default values that are associated with NameHunter comparison parameters (CompParms).

Key addition to the NameHunter API: the Name object

A Name object provides a single instance of a personal name (PN) or an organization name (ON) and is used to:

  • Add a name to a search list
  • Search for a name in a search list
  • Compare two names
  • Return search matches

You must construct a Name object before using any of the following NameHunter API functions:
  • Creating a name based on its name category (PN or ON)
  • Parsing a PN into its given name (GN) and surname (SN) constituents before creating a Name object
  • Identifying the culture category for the GN and SN components of a PN separately; the culture category is set to Ambiguous/Generic if not otherwise specified

Note: The culture category for an ON is always set to Ambiguous/Generic, even if a different setting is provided independently.


Search lists in GNR Version 4.1

The need to differentiate between PN and ON search logic in GNR requires NameHunter to create two or three physical lists in memory for each logical data list that is specified in the associated configuration file. One list is created for each of the following data types, and each list is transparent to the client application:

  • Name objects marked as PNs
  • Name objects marked as ONs
  • PNs that have been reformatted to allow ON-to-PN matching by setting the NameHunter configuration parameter OnToPnSearch to true (the default setting for this parameter is false)

The add() method is used to populate a search list. The following list summarizes the changes to the add() method in GNR Version 4.1:
  • The original form of the name is an optional element
  • Each ON that is added to the search list is subject to ON regularization rules (mandatory)
  • Each PN that is added to the search list is regularized if this feature is requested for the search list in the associated NameHunter configuration file

The search() method supports precise control over the scope of each search transaction. The following list summarizes the changes to the search() method in GNR Version 4.1:
  • A new parameter, srchTypeBitmap, is added for the search() method
  • The following values are valid for this parameter:
    1: search PNs only in this search list
    2: search ONs only in this search list
    3: search both PNs and ONs in this search list

Searching with Name objects

NameHunter NameScore() and NameMatch() APIs have been overloaded so that they can accept two Name objects as input arguments. When conducting a search, Name object categories can be the same (PN-PN, ON-ON) or different (PN-ON). The interpretation of two CompParms arguments is conditioned by the category of the Name objects that are supplied as input:

  • If the left name is a PN, then left CompParms is applied to GN comparisons, and right CompParms is applied to SN comparisons
  • If either name is an ON, then left CompParms values override default CompParms that are applied to ON matches

Overriding default CompParm settings

Each supported culture category in NameHunter is associated with a group of default CompParms settings. New APIs allow these default culture categories to be checked and modified:

static CompParms getDefaultParms(Culture culture, NameFieldType fieldType

  • Recover current default settings for a specific culture


static void overrideDefaultParms(CompParms& newparms)
  • Establish a new default settings for some or all CompParms that are associated with a specific culture:
    • Culture identifier and name field (GN or SN) values in CompParms structure cannot be modified by these APIs
    • Changes persist for the duration of the NameHunter session at the culture level
    • Further changes to one or more CompParms can be made on a per-transaction basis, such as in NameScore() and NameMatch() APIs
Build resources

The libraries that are required for building client applications remain the same as in GNR Version 3.2:

  • Unix/Linux/AIX: libNameHunter.a
  • Windows: NameHunter.lib

One new header file, gnrexcept.h, can be included to expose the details of exceptions that are generated by API calls.


Improvements and changes in name scoring techniques

  • Names that match perfectly only receive a perfect score (1.0) if neither name has been regularized
  • Two names with the first name unknown (FNU) in the given name (GN) field of a PN no longer receive a perfect score
  • Blank or missing surname (SN) values in a PN match are treated as no last name (NLN) instead of last name unknown (LNU) for cultures where single field names are still in use, such as in Indonesia
  • Comparisons of GNs or SNs that contain differing TAQ values are now scored more accurately
  • Affixes do not cause the anchor factor in a GN or SN to be applied incorrectly
  • GN and SN variants from all constituent cultures are applied when a GN or SN is associated with a roll-up or group culture, such as Southwest Asian

Miscellaneous NameHunter improvements
  • Name pair comparisons can now use two different regularization rule sets, one for each PN being compared
  • Improved threadsafe capabilities for NameHunter APIs
  • Improved error detection and error handling in NameHunter Server
  • Inherent TAQ and variant values can be overridden by user-supplied values

    Name Hunter Distributed Search (NH-DS):

    • Set Parameters message (P) is no longer supported because parameter overrides are now included in the Search (S) message
    • TAQ file format has changed to improve accuracy of scoring for names with differing TAQ values

    NamePreprocessor (NPP) upgrades:
    • NameClassifier – Country of Association (NC-COA) is used in place of NameClassifier for PN classification to improve accuracy and consistency
    • Names that are input to NPP can be in full format because NPP can call NameParser for unparsed input names

    Diagnostic scoring utility (why):
    • Custom TAQ and variant files are supported
Packaging changes in GNR products
  • IBM InfoSphere Global Name Analytics bundle now includes NameSifter
  • NameInspector, a Windows-only sample application, is removed
  • Source code for C++ sample programs for low-level components is removed

IBM NameWorks API Changes: Version 3.2 to Version 4.1

Key Summary Points

The C++ interface to IBM NameWorks is now public. The C++ APIs are the definitive versions of the IBM NameWorks APIs, whereas the Java APIs are a thin, generally transparent “wrapper” layer over the C++ interface. The following example illustrates the differences between the C++ and Java versions of the analyzeForSearch() API:

C++:
vector<QueryName> qnames;
scoring.analyzeForSearch(name, 70, qnames);

Java:
List<QueryName> qnames =
scoring.analyzeForSearch(name, 70);

Other wrappers can be created for the C++ layer, such as managed C# for .NET, Ruby extensions, PHP extensions, or Perl extensions. The following upgrades apply to the IBM NameWorks Version 4.1 APIs:

  • The Name object changes the interface style for some of the IBM InfoSphere Global Name Scoring APIs
  • Thread and socket controls available in the IBM NameWorks configuration file allow more control over system resource usage in IBM NameWorks-based applications
  • Expanded IBM NameWorks configuration file settings support per-search modification of predefined search strategies throughout a session
  • IBM NameWorks Embedded Search (NW-ES) now supports API-level calls to NameHunter for same-process search capability
  • Support continues for IBM NameWorks use of NameHunter Distributed Search (NH-DS) as an outboard search process, accessed via XML/IP connectivity

IBM NameWorks Analytics APIs
  • The IBM NameWorks Analytics Java APIs are essentially unchanged in GNR Version 4.1
  • A method is added for the categorize() API to determine the name type (either personal or organization)
  • The variant generation API is renamed as generate() instead of getVariants()

IBM NameWorks Scoring APIs
  • Exposes changes in the underlying functionality of the NameHunter search component
  • The Name object is introduced in parallel with its use in the NameHunter APIs
    • Combines name category, name structure, and name culture data
    • Returned by analyzeForSearch() and createName() APIs
    • Used as input to most other IBM NameWorks Scoring methods
  • Simplified search() method handles both PN and ON searches and accepts a Name object as input
  • The compare() API is enhanced to support user-defined culture specification (when Name objects are used as input) and search strategies:

compare(LeftName, RightName, String strategy)
  • Comparison Parameters (CompParms) accessibility is enhanced so that user-supplied session-level defaults for one or more cultures can override inherent product defaults:
    • CompParmsDefaults= entry in the [General] section of the IBM NameWorks configuration file
    • Session-level defaults are referenced when IBM NameWorks performs pair-wise comparison or uses IBM NameWorks Embedded Search
  • Enhanced configurations file options provide per-transaction control over predefined search strategies:
Parameter Definition
MinScore= Minimum match score
MaxReplies= Minimum match score
Maximum number of matches
SearchOpt= Search options (include ONs when searching for a PN)
IncludeTaqs= Include or exclude Titles, Affixes, and Qualifiers (TAQs) in search comparisons
[ONParms] CompParm overrides for organization name scoring
Note: relative CompParm overrides change the existing value by a percentage


IBM NameWorks Embedded Search (NW-ES)

  • Maps NameHunter search support from NH-DS to an in-process API call
  • Transparent to client applications: no changes to any public IBM NameWorks APIs
  • Controlled entirely through configuration options:
    • [Datalist:Sample]
    • Type=0
    • List=Sample1.csv
    • List=Sample2.csv|add
    • TAQ=taq.ibm
    • GNV=gnv.ibm
    • SNV=snv.ibm
    • VAR=fieldVar.ibm
    • CompressedBitSig=1
    • OnToPn=1
    • PNREG=angloRegRule.ibm,Anglo
    • PNREG=arabicRegRule.ibm,Arabic
    • PNREG=germanRegRule.ibm,German
    • PNREG=indianRegRule.ibm,Indian
    • PNREG=russianRegRule.ibm,Russian
    • PNREG=thaiRegRule.ibm,Thai
    • ONREG=genericOnRegRule.ibm,Generic
  • Allows “mix-and-match” support for GNR name search, using both local and remote processes
  • Lower marshalling overhead makes NW-ES preferable for smaller search lists and high associated transaction volumes
  • Unique name search is available in NH-DS only and is not supported in NW-ES
    • List can be made unique outside of NW-ES if the client can handle linkage between unique names and original name list
  • Can operate in federated mode like NH-DS
  • Loading of embedded data lists into NW-ES includes all name preprocessing:
    • Transliteration
    • Categorization
    • Parsing (including addition of alternate parses)
    • Culture classification
  • Preprocessing steps in IBM NameWorks are selectively disabled if valid required data is provided in the IBM NameWorks input files

Original publication date

2009/4/17

Rate this page:

(0 users)Average rating

Document information


More support for:

InfoSphere Global Name Management
InfoSphere Global Name Recognition

Software version:

4.1

Operating system(s):

AIX, Linux, Linux Red Hat - zSeries, Linux SUSE - zSeries, Linux/x86, Solaris, Windows

Reference #:

7015553

Modified date:

2009-04-17

Translate my page

Machine Translation

Content navigation