Information icon IBM InfoSphere DataStage and InfoSphere QualityStage, Version 8.5
space Feedback

Example of generating data in parallel

By default the Row Generator stage runs sequentially, generating data in a single partition. You can, however, configure it to run in parallel, and you can use the partition number when you are generating data to, for example, increment a value by the number of partitions. You will also get the Number of Records you specify in each partition (so in the example where you have asked for 100 records, you will get 100 records in each partition rather than 100 records divided between the number of partitions).

In this example you are generating a data set comprising two integers. One is generated by cycling, one by random number generation.

The cycling integer's initial value is set to the partition number (using the special value `part') and its increment is set to the number of partitions (using the special value `partcount'). This is set in the Edit Column Meta Data dialog box as follows (select column in Columns tab and choose Edit Row... from shortcut menu):

Figure 1. Generator settings
Shows the setting in the Edit Column Meta Data dialog box for generating rows in partitions

The random integer's seed value is set to the partition number, and the limit to the total number of partitions.

When you run this job in parallel, on a system with four nodes, the data generated in partition 0 is as follows:
Table 1. Partition 0 data
integer1 integer2
0 1
4 2
8 3
12 2
16 3
20 1
24 2
28 3
32 3
36 1
40 0
44 3
48 3
52 2
56 0
60 0
64 1
68 3
72 2
76 2
80 0
84 1
... ...

PDFThis topic is also in the IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide.

Update timestamp Last updated: 2012-10-8