Introduction to the high-level integration language

Data processing flows that aggregate facts from large collections of structured or unstructured data into clean, unified entities include many stages of processing. The stages typically start from the outcome of information extraction (such as output from IBM® Accelerator for Social Data Analytics) and continue with entity resolution, mapping, and fusion. High-level integration language programs capture the overall integration flow through a combination of SQL-like rules.

What is the high-level integration language?

High-level integration language is a scripting language for entity resolution and integration. It supports ETL-like features (mapping and fusion), matching functions, deterministic and probabilistic entity resolution rules, and conflict resolution policies. The language was designed for non-traditional data sources, for example, data that might be sparse, nested, or lacking strong identifying attributes. Use of the language is ideal for entity resolution between unstructured social media data and structured enterprise customer data.

How does the language work?

The language uses entity models and rules to make connections among data and to match existing entities with non-traditional, unstructured data. It enables the creation of entity-centric 360-degree data views that combine internal customer profiles with external social media data.

The language is a core component of IBM BigInsights™, and includes a compiler that transforms scripts in the language (which have the .hil extension) into Jaql programs.

The language exposes a data model and constructs that are specific for the various tasks in entity integration flows. First, it defines the main entity types, which are the logical objects that a user intends to create and manipulate. Each entity type represents a collection of entities, possibly indexed by certain attributes. Indexes are logical structures that form an essential part of the design of the language; they facilitate the hierarchical, modular construction of entities from the ground up. The philosophy of the language is that entities are built or aggregated from simpler, lower-level entities. A key feature of the language is the use of record polymorphism and type inference, allowing schemas to be partially specified. In turn, incremental development where entity types evolve and increase in complexity is enabled. The language uses logical indexes in its data model to facilitate the modular construction and aggregation of complex entities. Another feature is the presence of a flexible, open type system that allows the language to handle input data that is irregular, sparse, or partially known.

Rules within the .hil files are used to create flows that control entity integration and output. The language supports two types of integration rules.

Entity population (EP) rules express the mapping and transformation of data from one type into another, as well as fusion and aggregation of data.
Entity resolution (ER) rules express the matching and linking of entities by capturing all possible ways of matching entities, and by using constraints to declare properties on the wanted output. For example, one-to-one or one-to-many types of matches.

The high-level integration language components of IBM Big Match include .hil files, which are used primarily for matching MDM and social data and as a compiler that transforms .hil scripts into Jaql programs. The language exposes a Java™ API that tools can use to compile language programs into Jaql directly from Java. The language currently does not have its own runtime component – it relies on Jaql for execution.

Last updated: 25 Jun 2015
End Fix Pack 11.4.02 information