mpxcomp utility

The mpxcomp utility enables the comparison of records and is one of the processes used during bulk (BXM) and incremental cross matches (IXM).

There are four different situations in which you run the mpxcomp utility. You can run mpxcomp:

During the initial stage of implementation to generate baseline comparison scores.
During the “reiterate” step of the implementation process. After going through the entire set of implementation steps and analyzing your data results, you might determine that modifications to your algorithm and data dictionary are necessary. If so, you typically rederive your data (by using the mpxdata, mpxfsdvd, or mpxredvd utilities) and run another BXM.
After implementation if you modify the attributes that are used by your comparison functions (for example, adding an alias to a name comparison) or you change your bucketing configuration. Comparison function and bucket changes require new weights, a rederivation of data, and a new BXM.
When running an IXM.

The utility can be run from a command line or, preferably, from the InfoSphere® MDM Workbench jobset wizard.

When run, this utility selects candidates, compares member records, and assigns comparison scores. The mpxcomp utility must be run once for each type of entity (for example, identity and household) implemented, because the comparison algorithm is specific to each entity type.

Regarding system performance, the mpxcomp utility loads the entire input data set into memory for processing. Working with large can cause memory issues. Your server must have sufficient continuous memory to accommodate the data files. For large data sets, you can elect to use the *Part options to conserve system memory and optimize performance. Use of these options (-nMemParts, ‑nBktParts, -minBktPart, -maxBktPart, and -maxParts) partitions the data to avoid pulling the entire set into memory at one time. To accommodate available memory, start by adjusting the ‑nBktParts option.

If you plan to partition data, devise a partitioning strategy before beginning data derivation. Data must be partitioned consistently between the derivation step (mpxdata, mpxfsdvd, mpxprep, or mpxredvd utility), the comparison step (mpxcomp utility), and the linkage step (mpxlink utility).

Before you run a utility, make sure that you have set the necessary operational server environment variables. For information about the variables, see the operational server environment variables topic.

Last updated: 2 Nov 2018