Improving the performance of equality or comparison evaluation

Hashing expressions can improve performance at equality-test stage.

Test evaluations that involve variables from two different rule conditions can be time consuming. In the following example, the condition on Account references the variables ?p and ?b bound in the previous conditions.

Example: Bank accounts

when {
  ?p:Person(age > 20);
  ?b:Bank();
  ?a:Account(name equals ?p.name; bank == ?b; 
            balance < ?p.minBalance);
  ...
}

Without optimization, if there are 1,000 instances of Person with age > 20 and 5 instances of Bank, the three tests in the Account condition are executed 5,000 times every time a new instance of Account is added. This is because this new instance has to be tested against all the possible combinations of the ?p and ?b variables.

Autohashing optimization minimizes the execution of the tests based on the equality operators by relying on a hash table. A hash table is conceptually a contiguous section of memory with a number of addressable elements in which data can be quickly added, deleted, and found. Hash tables increase memory consumption for the purpose of gaining speed. They are certainly not the most memory-efficient means of storing data, but they provide very fast look-up times.

To successfully store and retrieve data from a hash table, a link is generated.That link can be mapped to one of the addressable elements of the hash table. Java™ classes implement the hashCode method to generate those links. Test execution can be reduced by a factor of 20, which is the dimension of the table that holds the links. So at best, there are 20 different entries for a value. In the Bank accounts example above, this saving would relate to 250 tests instead of 5,000 every time a new instance of Account is added.

Note:

Hashers always return a superset of the matching objects and the equality conditions still require a verification of the objects.

The following figure illustrates autohashing in the RetePlus network. It shows how the Bank accounts example would be handled in Decision Server.

Autohashing in the RetePlus network

The RetePlus network uses hashing to perform operations on object equality tests such that a link is generated from that data. The link identifies which memory element of the hash table contains the satisfied equality test. In other words, the hash code is the link to the object matching the side of the test. The test is replaced by a hash code value and the corresponding object is stored in the hash table with that value. The hashing mechanism then uses the equality operator (==) to test whether two objects are equal according to the equals(Object) method. When the hashCode method is called on each of the two objects, it must produce the same integer result.

Join nodes are generated from these links inside the RetePlus network. A join node uses a single hash table. A hash table can process multiple equality tests.

To use autohashing, you need to create a configuration resource file so that the autohash mode is set to true. By default the autohash mode is turned off.

Using hashing expressions

Within a ruleset header, hashing expressions can be specified on a class. Hashing expressions and autohashers are not mutually exclusive, if both are defined a class hasher always takes precedence over an autohasher.

The following ruleset header provides an example of when to use a class hasher. The class Customer is assigned a category, and the possible categories are defined as int values.

Note:

Only methods that return an int can be used as a hasher.

ruleset CustomerRuleset
{
    hasher(Customer c) = c.getCategory();
    ...
};

The keyword hasher introduces a hasher definition. In this example, a hasher for the class Customer is defined. The definition says that for an object of the class Customer named c, a hasher is the Customer's Category.

A class may have several hashers. For example, you could add this hasher to the previous header:

hasher(Customer c) = c.getName().length(); 

This Customer hasher is defined as the length of its Customer.Name.

The rule engine exploits the hashers to optimize its internal matching algorithm. Whenever the expression provided by the hasher is used in a == (equality operator) test, the hasher can be applied. Here are some rule conditions on which a hasher can be chosen and applied:

// Query customers by category
CustomerQuery(?c: getCategory());
?cus: Customer(?cus.getCategory() == c);

or:

// Query customers by the length of their names
CustomerQuery(?n: getName());
?cus: Customer(?n.length() == ?cus.getName().length());

For these two condition groups, the rule engine recognizes that one of the expressions on either side of the equals == sign matches one of the provided hashers, and it applies that hasher for optimization.

Internally, the engine classifies the instances of the class using the int value computed by the hashing expression.

Note:
  • There could be many tests in the condition part, and a hasher is applied when a side of a condition equals == test matches the hashing expression.
  • Hashers cannot be used with an equals method.

Class hashers are more flexible than the autohashing mechanism provided through the configuration file, because you can choose the most appropriate hashing expressions with the ILOG® Rule Language (IRL).

Using hasher properties

The ruleset hasher definition is extended with hasher properties, shown in the following property definition block:

hasher ( Customer c ) = c.age
{
     property accurate = true;
     property ordered = true;
     property final = true;
}

The accurate property

The accurate property activates the internal use of more efficient hashing tables. However, this may induce greater memory consumption. Use the accurate property when the domain of the hashing expression is not too large. For example, you can expect that a c.age attribute has a small domain included. At the opposite end, it is likely that a population attribute of a City class has a large domain and should not be related to accurate tables.

The ordered property

The ordered property indicates that the hasher is compatible with inequality test expressions. It requires that a domain be defined on the hashing expression in the XOM. The hash processing is then performed on either equality or comparison expressions. Using a hashing comparison expression, you can expect to speed a test evaluation up to twice as much.

As an example, consider the activation of an ordered hasher on the age > a.value test in the rule filterCustomer:

rule filterCustomer
{
 when {
    a: Account();
    c: Customer ( age > a.value );
 }
 then {
    ...
 }
}

Note that the Customer.age attribute must have been related to a domain in the XOM, or else the ordered hasher would not be activated on the comparison test.

The final property

Declaring a hasher as final can speed up hash processing when the object set will be constant or almost constant during ruleset execution. The Decision Server rule engine takes advantage of this constancy to further improve performance.

Declaring a changing hashing set as final does not cause execution errors or alter the behavior, but it may slow down the execution.

As an example, consider the following declaration:

hasher ( Town t ) = c.population
{
   property final = true;
}

The set of every town in the working memory is then supposed to be nearly constant. While any changes applied on the population of a town or insertions of new towns are possible, these operations could be more costly in execution time than if the final property had not been applied.