Information Management IBM InfoSphere Master Data Management, Version 10.1

mpxsort utility

The mpxsort utility is used to reorder a binary file generated from the bulk cross match (BXM) and incremental cross match (IXM) utilities.

Specifically, mpxsort reorders the bxmlink file when there are multiple member parts or multiple threads used during the creation of the bxmlink file. This sort order is required by the non-transitive logic to keep the transitive entity sets grouped together so that members can be removed from the set (and possibly form additional entities) for the non-transitive phase.

The mpxsort utility is run between the second mpxcomp and mpxlink phase. The input to mpxsort is the output of the mpxcomp utility. When using the mpxsort utility, match the number of parts (-mpxparts) with the number of parts specified for the mpxcomp utility. The mpxsort output is then consumed by the mpxlink utility.

A command-line parameter unique to mpxsort is the -{no}radix sort option. A radix sort, also known as a binary sort, is an extremely fast method of sorting binary records. While a radix sort is faster than a quick sort (which is our default sorting algorithm), the radix sort consumes twice as much memory as a quick sort. On servers where memory is a constraint, the -noradixsort option can be specified and a quick sort is used to conserve memory. On servers where memory is not an issue and maximum performance is required, the default -radixsort option can be used.

Again, the mpxsort utility supports only bxmlink files which are the output of the mpxcomp utility. I

Usage example:

mpxsort -enttype hh -bxmlink -bxminpdir /bxminp -bxmoutdir /bxmout

This example sorts the mpx_bxmlink_xx.XXX file for the household (hh) entity type.

All options and flags are case independent; option values are not.

-nthreads option defaults to the number of processors on the server.

Table 1. mpxsort options
Option Type Description Default
-entType name entity type name NONE
-bxmInpDir dirName .bin file input directory NONE
-bxmOutDir dirName .bin file output directory NONE
-nMxmParts N Number of maximum partitions. Match this setting to the number of parts specified in the output of the BXM utility used to generate the file being used as input to the mpxsort utility. 1
-nThreads N Number of threads the number of CPUs
-{no}bxmLink   Use linkage records from the mpxcomp utility.

Currently, the mpxsort utility supports only bxmlink files which are the output of the mpxcomp utility. Use the -bxmLink to avoid errors.

-nobxmLink
-{no}radixSort   Use quick sort instead of radix sort. radixSort


Feedback

Timestamp Last updated: 14 Nov 2014

Topic URL: