Link Collector Stages

These topics describe how to use a Link Collector stage in your job design.

The Link Collector stage is an active stage which takes up to 64 inputs and allows you to collect data from these links and route it along a single output link. The stage expects the output link to use the same metadata as the input links.

The Link Collector stage can be used with a Link Partitioner stage to enable you to take advantage of a multiprocessor system and have data processed in parallel. The Link Partitioner stage partitions data, it is processed in parallel, then the Link Collector stage collects it together again before writing it to a single target. To really understand the benefits, see IBM InfoSphere DataStage Jobs and Processes to learn how IBM® InfoSphere® DataStage® jobs are run as processes.

The following diagram illustrates how the Link Collector stage can be used in a job in this way:

Shows a job with Link Collector stages

In order for this job to compile and run as intended on a multiprocessor system you must have interprocess buffering turned on, either at project level using the Administrator client, or at the job level from the Job Properties dialog box.

The temporary files generated by this stage are placed in the directory specified by the TEMP environment variable. Use the Administrator client to set TEMP on a per-project basis.