IBM InfoSphere Master Data Management, Version 11.3

Developing the algorithm

Within the InfoSphere® MDM Workbench, you can create one or more algorithms for the big data matching applications to use.

About this task

When you run big data matching, the applications read the contents of an .xml configuration file for each table. That configuration file indicates which algorithm (or algorithms) to use for processing the data for the table.

Because creating an algorithm for use with big data matching does not differ from creating an algorithm for a typical installation of IBM® InfoSphere Master Data Management, refer to the existing MDM documentation for the required steps. Follow the links at the end of this topic.

In general, an algorithm you create for use with big data matching does not differ from an algorithm you create for a typical installation of IBM InfoSphere Master Data Management. One exception is the integration with IBM InfoSphere Global Name Recognition (GNR), which is a name-recognition and name-scoring technology that classifies, searches, analyzes, and compares global name data sets. The GNR technology relies on a relational database infrastructure that is not available with InfoSphere BigInsights™.

Many users need to provide data across their enterprise to different teams, user personas, and other organizations. For these reasons, big data matching enables you to create multiple algorithms within a single configuration. For example, you might want to run query and resolve algorithms on the same data. In the .xml configuration file, you specify the relevant algorithms in a comma-separated list. The applications write the output of all algorithms to a single HBase table. The output of each algorithm is written to the table with an algorithm identifier. The identifier allows you to run multiple algorithms for the same member type.

Last updated: 27 June 2014