IBM InfoSphere Streams Version 4.1.1

Parallel transformations

When you submit a job for a streams processing application that has user-defined parallelism, parallel transformation occurs. Parallel transformation converts the logical application into a physical application that can be deployed.
The parallel transformation transforms every logical processing element (PE) into at least one physical PE, and every operator into at least one physical operator. However, the transformation replicates only the logical PEs and operators that are inside the parallel regions. Operators and processing elements (PEs) are replicated according to the following rules:
  • If a logical processing element contains only operators from inside a specific parallel region, that logical PE is replicated. Logical operators become N physical operators, where N is the width or degree of parallelism that you specified.
  • If a logical PE contains operators from outside a specific parallel region, that logical PE is not replicated. Instead, the operators that are in the parallel region are replicated inside the PE. In addition:
    • If a splitter is fused inside the PE, threaded ports must be inserted before the operators that the splitter communicates with.
    • If the merge point is not fused inside the PE, new PE output ports are created as required.
  • Directly adjacent parallel regions, where the output ports of the parallel region feed directly into the input ports of another parallel region, always have shuffles between them. A shuffle means that each of the replicated operators at the end of the first region have splitters that feed into the replicated operators at the beginning of the second parallel region.

A sibling operator is a physical operator that is derived from the same logical operator as another physical operator. Likewise, a sibling PE is a physical PE that is derived from the same logical PE as another physical PE. Sibling PEs and operators are identical except for the names, indexes, and placement configuration options.

The application manager service performs the parallel transformation and tracks the resulting changes. When the transformation is complete, every output port with a splitter that feeds into a parallel region has N replicated connections that feed into the N replicated operators, where N is the fully expanded width of that parallel region.