Product documentation
Abstract
This document summarizes fixes and corrective enhancements for InfoSphere Global Name Management, Version 4.2.
Content
InfoSphere Global Name Management, Version 4.2 is now available. This release includes the following significant corrections and resolved defects.
NameHunter Server
-
Handling of the text format of floating-point numbers was not sensitive to system locale
When the decimal separator character is set to the comma (,) character (as in many countries),
both the parsing and generation of the text format of floating-point numbers is no longer
disrupted. InfoSphere Global Name Recognition components no longer expect a period (.) as a decimal separator.
NameHunter (LAS::NH class)
-
The regularizeScoreMax variable was applied to strings that were already exact matches before regularization (exacts score < 1.0)
The regularizeScoreMax variable now prevents strings that were not identical before regularization from receiving a score of 1.0 if they are identical after regularization. For example, if MOHAMAD and MOHAMMOUD both regularize to MOHAMED, they should not receive a score of 1.0, preserving the fact that the names were not identical originally.
IBM NameParser
-
Alternate parse did not work when the surname was not in the Name Data Object (NDO)
NameParser will now give an alternate parse if you pass a name to the parser where the surname is not in the NDO, but the given name is an obvious surname,
Unexpected behavior in getNameStats() function when input string contains a comma
When getNameStats() is called with an input string that contains a comma, there is now no re-ordering of name segments.
In addition, InfoSphere Global Name Recognition, Version 4.2 includes the following enhancements that improve performance and effectiveness.
Enhanced support for organization names
-
You can search for, compare, and modify organization names that occur in your data. NameHunter now processes terms with embedded white space (multi-token terms) with the following benefits:
- Recognize that multi-token terms are one entity
- Reduce the number of false positives returned by organization name searches
- Implement organization acronyms. For example, IBM is the same as International Business Machines
- Better variant files and an simpler variant file format
- Reduced organization name regularization, which consumes excessive CPU resources at load time
- Reduced memory use and increase query speed
Related NameHunter enhancements include:
Expanded and enhanced IBM NameParser APIs
-
The following improvements to parsing and performance for IBM NameParser are included in this release:
- More effective parsing of Chinese and Korean names
- Support for digits, punctuation, and spaces
- Better reparse logic
- Reduced errors
For a complete list of InfoSphere Global Name Recognition, Version 4.2 features and enhancements, see http://publib.boulder.ibm.com/infocenter/gnrgnm/v4r2m0/topic/com.ibm.iis.gnm.overview.doc/topics/gnr_con_gnmwhatsnewv42_productfeatures.html.
Rate this page:
Copyright and trademark information
IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.