Creating parallel operators

Most of operators that you create will be parallel. The processing power of a parallel job is directly related to its ability to run operators in parallel on multiple processing nodes in a system.

Graphically, a parallel operator can be represented as shown in the following figure. This figure shows a two-input, one-output operator.

Figure 1. Parallel operatorTwo-input, one-output operator

All steps containing a parallel operator will also contain a partitioner to divide the input data set into individual partitions for each processing node in the system. The partitioning method can either be supplied by the framework or defined by you.