Shared aggregates

Use shared aggregates to compute the value of an entity attribute by aggregating values from events that are associated with this entity. You can use the following aggregation operators: maximum, minimum, average, total, and number of. Shared aggregates can be more efficient than aggregates defined in rules to calculate a value from the history of past events because they can be used by multiple agents and applied to multiple time periods and time points.

A shared aggregate is related to an attribute of an entity type in the business model definitions. The attribute must be specified with a data type of number or numeric. For example, an aggregate named purchase count is defined as an attribute of the customer entity:

a customer is a business entity identified by an email with
   a first name,
   a last name,
   a purchase count (a number).

In the business model statements, you specify the attribute name for which you want to compute a value by using a shared aggregate. You then specify how the event type is related to the entity and set any conditions and the time during which events can be queried. You can add a time condition to specify the length of time the aggregation is calculated over and other conditions in the <aggregate expression>.

For example, the following statement defines the aggregation of values for the purchase count as the total number of purchase events that the customer made:

the purchase count of a customer is aggregated from purchase events, 
   where this customer comes from the customer of each purchase event 
   as the number of purchase events available for 30 days.

Depending on how the aggregate is defined, rules can query aggregated values for a specific time point or a specific time period. The available for <time duration> clause limits the time during which you can query the aggregate. In this example, the time point or time period must be within 30 days of the most recent event received.

An aggregate definition can contain conditions with time filters that refer implicitly or explicitly to the current time now. For aggregates that refer to the current time, you can use time point queries, but not time period queries.

For example, the following aggregate definition refers to now, because it contains a time filter with current month:
the longest monthly delay of a train is aggregated from train delays,
where this train comes from the train of each train delay
as the maximum delay of all train delays during the current month
available for 1 year.
The following aggregate definition does not refer to now, because day of the week is based on the timestamp of the event, but does not depend on now:
the average weekday delay of a train is aggregated from train delays,
where this train comes from the train of each train delay
as the average delay of all train delays
where the day of the week of each train delay is not one of { Saturday, Sunday }
available for 1 year.

Aggregates can be queried as far back as the available-for period: time point queries must be within this period, and time period queries must start within this period. In order to have reproducible results, the system keeps events in the backing database for up to twice the available-for period. Therefore, the horizon of the events is automatically calculated to be twice the available-for period. The following diagram shows the period during which you can query aggregates on a timeline. You can query an aggregated value for any time point or time period that is in the green zone, that is, during the available-for period and afterward. Events are deleted when they enter the red zone, where they are older than the horizon.

Figure 1. Period during which you can query an aggregated value

Shared aggregates depend on the existence of the attribute's entity and cannot be computed if the entity does not exist. Events that arrive before the creation of an entity cannot be used to aggregate a value for an attribute of this entity. Entity initializers are applied before shared aggregates are updated, therefore you can ensure the existence of an entity before an event is aggregated by initializing the entity from this event.

You can use shared aggregates in the rule and Java™ agents of your solution. You can also access shared aggregates from the REST API.

Default values

The result of the aggregation returns a default value when the time point or time period of the aggregation is during the available-for period and afterward, and the selected collection of events is empty. You can add a specific default value to the aggregation that applies when the amount of data that is aggregated is not sufficient to provide significant results. You use the following construct:
defaulting to <number|null> [ if there are less than <number> events ]

If no default value is specified, aggregations that use the maximum, minimum, average, and total operators return null. You cannot specify a default value when using the number of operator, which returns 0 when there is no data to process.

Resolution

You can specify a temporal resolution in the aggregation definition to improve the performance of the aggregate. The default resolution is 1 second. A resolution combines the data of multiple events over its specified period. When a shared aggregate defines a resolution of 1 minute, all of the events of the type that are aggregated within a minute of one another are combined. When a rule queries the shared aggregate, the time that is used in the query is adjusted to use a set of 1-minute buckets. Only whole buckets can be included in a result. It means that buckets that start or end in the time period are also used in the result. For example, if a rule queries a shared aggregate over a time period that start at 7/21/2015 6:00:15 PM and ends at 7/21/2015 6:04:50 PM, five 1-minute buckets are selected as close as possible to 6:00:15 and 6:04:50. As a result, the period of the aggregation is adjusted to start at 6:00 and end at 6:05. Larger resolutions have faster performance, but smaller resolutions are more accurate, because the adjustments are smaller. The resolution can be defined in hours, minutes, and seconds. You use the following construct:
with a resolution of <time duration>
In the following example, the aggregation is defined with a resolution of 1 day.
the average purchase of a customer is aggregated from purchase events, 
   where this customer comes from the customer of each purchase event 
   as the average price of all purchase events
   available for 30 days
   with a resolution of 24 hours.

You cannot specify a resolution in aggregate definitions that contain a time filter that references now.

Resolution is especially useful if the aggregate specifies a long period of available events. You can improve the performance with a minimum impact on the accuracy of your aggregates if you use the maximum, minimum, and average operators. To make sure that the values are reasonably accurate, the resolution should be much smaller than the smallest period that you intend to query. For example, if you query an aggregate at intervals of weeks or months, you can set the resolution to an order of minutes or hours, and improve performance while retrieving results that are accurate enough to make a valid decision.

Examples

A statement for a shared aggregate with no condition
In the following example, the value of the purchase count attribute is calculated from the values of purchase events that are related to a specific customer.
the purchase count of a customer is aggregated from purchase events, 
   where this customer comes from the customer of each purchase event 
   as the number of purchase events
   available for 3 days.

The system computes the total number of purchase events that the customer made. When the aggregate statement contains no time filter in the conditions, a rule can specify a time period from which to retrieve a calculation. A time period query can aggregate only values from events that occurred during the specified time period.

The following diagram shows queries of the purchase count attribute over different time periods. E1, E2, and E3 represent purchase events, T is the beginning of the period during which events are available, and H is the horizon time point. The most recent event refers to the most recent event that the engine received, which might be different from the most recent event that the rule agent processed. The points represent the time at which the event's value could be aggregated.

This diagram shows queries of the purchase count attribute at different time periods.

Values from E1 cannot be aggregated because it occurs before the horizon time point. Query 1 returns null because its time period is before the available period. Query 2 returns the value from E2 because it is the only event that occurred during its time period. Query 3 returns 0 because no event occurred during its time period.

A statement for a shared aggregate with one or more conditions

An aggregate statement can specify time and value conditions.

In the following example, the value of the purchase count attribute is calculated from the values of purchase events that are related to a specific customer during the last period of 3 days.
the purchase count of a customer is aggregated from purchase events, 
   where this customer comes from the customer of each purchase event 
   as the number of purchase events 
   during the last period of 3 days
   available for 1 week.

When the aggregate statement contains one or more time conditions, a rule can specify a time point from which to retrieve a calculation. The time point determines which events are aggregated. If no time point is specified, the value of the variable now of the rule agent is used. When the aggregate statement contains only value conditions, a rule can query only the aggregated values over a time period.

The following diagram shows the result of rule queries of this aggregate at different time points. The solid lines represent the time at which the event satisfies the conditions in the aggregate definition. T represents the beginning of the period during which events are available. Events that occur before this time point cannot be used for aggregations. Events that occur after this time point can be used for aggregation queries that are made during the available period, that is, in the green zone.

This diagram shows an example of different time point queries of an aggregate that has a time condition.

Values from E1 can be aggregated because it occurs before the available period but after the horizon time point. Query 1 returns null because its time point is before the available period. Query 2 returns 1 because it aggregates values from E2. Query 3 returns 1 from E3.

The aggregate statement can contain several conditions. In the following example, the value of the purchase count attribute is calculated from the values of purchase events that have an amount greater than 500, and that occurred during the current week.
the purchase count of a customer is aggregated from purchase events, 
   where this customer comes from the customer of each purchase event 
   as the number of purchase events
   during the last period of 3 days,
   where the amount of each purchase is more than 500
   available for 1 week.

The following diagram shows the result of rule queries of this aggregate at different time points.

This diagram shows an example of different time point queries of an aggregate that has several conditions.

Values from E1 can be aggregated because it occurs before the available period but after the horizon time point. Query 1 returns 1 because it aggregates values from E1. Query 2 returns 0 because the amount of E2 doesn't satisfy the value condition in the aggregate definition. Query 3 returns 1 from E3, and doesn't aggregate values from E2, because E2 doesn't satisfy the value condition and it occurs more than 3 days before the query.

A statement for a shared aggregate with a default value

In the following example, the value of the bill attribute is calculated from the purchase events that are related to a specific shopping cart entity. The aggregate value is 0 if the number of purchase events is under 10.

the bill of a shopping cart is aggregated from purchase events where this shopping cart comes from the cart of this purchase event
   as the total price of all purchase events
   during the current month,
   where the amount of each purchase is more than 500
   defaulting to 0 if there are less than 10 events
   available for 6 months.

In the following example, the value of the default value of the aggregate is 0, but no condition is defined for the application of this value. The aggregate takes the default value if the query in a rule refers to an aggregation outside of the available period, or if there is no data available and the aggregation returns null.

the bill of a shopping cart is aggregated from purchase events where this shopping cart comes from the cart of this purchase event
   as the average price of all purchase events
   during the current month,
   where the amount of each purchase is more than 500
   defaulting to 0
   available for 6 months.

In this example, the available period is set to 6 months. It allows queries over a long time period in the past, for example to compare the query results at many different time points within the same rule. But such a long available period also significantly increases the memory consumption.