Common name search issues with IBM NameHunter and Arabic names.

Technote (FAQ)


Question

ISSUE 1: Searching of known aliases of FAHID MOHAMMED ALLY MSALAM produces seven names:

FAHID MOHAMMED ALLY MSALAM
FAHID MOHAMMED ALLY
FAHAD ALLY MSALAM
FAHID MOHAMMED ALI MSALAM
FAHID MOHAMMED ALI MUSALAAM
FAHID MUHAMMAD ALIM SALAM
MOHAMMED K.A ALLY MSALAM

Using one of the identified variants, "FAHID MUHAMMAD ALIM SALAM," one additional variant (in addition to the previous seven variants) was found:

FAHID MUHAMAD ALI SALEM

Two of the known aliases still remain undetected:

FAHID MUHAMMAD ALI
USAMA AL-KINI

ISSUE 2
Searching on "FAZUL ABDALLAH MOHAMMED" not all known name variants are found.

The following name variants were found:

MOHAMMED FAZUL ABDALLAH
MOHAMMED ABDALLAH FAZUL
ABDALLAH FAZUL MOHAMMED

The following name variants were not found:

ABDALLAH MOHAMMED FAZUL
FAZUL MOHAMMED ABDALLAH

However, searching on "Abdallah Fazul Mohammed" on the "first-run" for the search produces this name variant:

Fazul Mohammed Abdallah

Answer

Resolution to ISSUE 1

The name USAMA AL-KINI is an alias used by the person also known as FAHID MOHAMMED ALLY MSALAM. The name used as an alias has no relation to the query name -- neither USAMA nor AL-KINI is a variant form of any of the name tokens in the query. GNR name-matching technology recognizes strings that bear some kind of phonological, orthographic, or semantic similarity, such as MUHAMMED/MHEMID or ELIZABETH/BETTY. Linking unrelated names that happen to be used by a single person is outside the scope of the GNR products. This functionality can be added by the customer through the database system that they use.

One of the difficulties in working with Arabic names is that there is often no single element in a name that is consistently used as a surname across the whole region. In the name FAHID MOHAMMED ALLY MSALAM and its variants shown here, any of the name tokens can be a legitimate given name or surname. That is, one person might have the given name FAHID and might use MOHAMMED as his surname, while another might be called MOHAMMED and have the surname FAHID.

GNR uses an automatic name parsing algorithm that weighs the relative field frequency of the tokens found in a name. When similar names have a different number of name tokens, they can be parsed differently into given name and surname fields. If two entries then have different names in the surname fields (for example, if one entry has SALAM in the surname field, while another has ALI), the entries might not match each other if they fail to meet the surname threshold score setting.

One strategy for working with such names is to set the surname threshold parameter to "0" (zero), and raise the given name and overall threshold parameters. This will allow names with non-matching surnames to match on the basis of given name similarity. The system administrators can experiment with different parameter settings. However, this type of search strategy might increase the number of false positives.


RESOLUTION to ISSUE 2

The match failure in this case is for the same reason as that noted above-the tokens are parsed into different fields from those in the query name.

The GNR parsing technology does not generate every possible name permutation. Rather, it considers frequency statistics in relation to the order in which the tokens are found in the record. This is because the syntax of a name (that is, the ordering of the tokens) is usually important in discriminating between different names, particularly for Arabic names, where the second token normally is the individuals father's name rather than the name of the individual himself. Generating all possible combinations would be likely to produce many false positives.


Cross reference information
Segment Product Component Platform Version Edition
Information Management InfoSphere Global Name Management

Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

InfoSphere Global Name Management
InfoSphere Global Name Scoring

Software version:

2.1

Operating system(s):

AIX, Linux Red Hat - iSeries, Linux Red Hat - pSeries, Solaris, Windows

Reference #:

1247563

Modified date:

2013-05-14

Translate my page

Machine Translation

Content navigation