IBM Streams 4.2

Developing streams processing applications with consistent regions

Because of business requirements, some applications require that all tuples in an application are processed at least once. You can use a consistent region in your streams processing applications to avoid data loss due to software or hardware failure and meet your requirements for at-least-once processing.

A consistent region is a subgraph where the states of the operators become consistent by processing all the tuples and punctuation marks within defined points on a stream. This enables tuples within the subgraph to be processed at least once. The consistent region is periodically drained of its current tuples. All tuples in the consistent region are processed through to the end of the subgraph. In-memory state of operators are automatically serialized and stored on checkpoint for each of the operators in the region.

If any operator in a consistent region fails at run time, IBM® Streams detects the failure and triggers the restart of the operator and the reset of the region. In-memory state of operators are automatically reloaded, and deserialized on reset of the operators.

The capability to drain the subgraph, which is coupled with start operators that can replay their output streams, enables a consistent region to achieve at-least-once processing.

A stream processing application can be defined with zero, one, or more consistent regions. You can define the start of a consistent region with the @consistent annotation on an operator. IBM Streams then determines the scope of the consistent region automatically, but you can reduce the scope of the region with the @autonomous annotation. Your application must include one new JobControlPlane operator that coordinates the draining and resetting of the operators in the consistent regions.

When a subgraph is a consistent region, IBM Streams enables the operators in that region to drain and reset. When a region is draining, it establishes logical divisions in the output streams of each operator in the region. A drain is successful when all operators in the region establish a logical division in their output streams, and when all tuples before the logical division are processed in their input streams. If a drain is successful, it means that all operators in the region consumed all input streams up until the established logical division. While the region is draining or resetting, operators in the region that completed their draining or resetting cannot submit new tuples. This behavior means that the tuple flow within the subgraph briefly stops while the region is draining or resetting.

Operators in a consistent region do not process or submit final punctuation markers. When final markers are received or submitted, that punctuation is silently dropped by the IBM Streams instance.

Requirement: Applications with consistent regions must use the TCP transport.
Hint: Processing elements with operators that reside in a consistent region start up with a Java virtual machine. As a result, applications with consistent regions are expected to consume more memory than applications without consistent regions.
To summarize the requirements for implementing a consistent region:
  • The start of a consistent region is defined with the @consistent annotation on an operator, making it a start operator.
  • A start operator must be able to replay its output stream and typically is a source operator. For a list of start operators, see the topic about operators that support consistent regions.
  • If a composite operator is annotated with @consistent, a start operator can also be an operator whose input stream connections originate from outside of the enclosing annotated composite. Again, such an operator must be able to replay its output stream. If it is not, the composite must include a ReplayableStart operator as the start operator. Remember, however, that applications that use a ReplayableStart operator cannot be deployed to an instance with fewer than four host resources.
  • Your application must include one JobControlPlane operator.
  • A checkpoint data store must be configured for the domain or instance running your application.