You must configure the Netezza connector to perform parallel reads; by default the connector runs sequentially. A parallel read is when the data is divided into subsets of data, and then the data is concurrently read by different processing nodes. The Netezza connector supports modulus partitioning. With modulus partitioning the rows are distributed between the processing nodes by adding a modulus expression against the special Netezza column datasliceid to the WHERE clause. For more information about the partition configuration and logical nodes, see the Parallel Job Developer's Guide.
When this job is run, the WHERE clause in your SELECT statement is modified to return a subset of rows that are read by each processing node.
SELECT col1, col2 FROM table WHERE mod(datasliceid,4)=0
SELECT col1, col2 FROM table WHERE mod(datasliceid,4)=1
SELECT col1, col2 FROM table WHERE mod(datasliceid,4)=2
SELECT col1, col2 FROM table WHERE mod(datasliceid,4)=3
Place holder | Description | Value when running sequentially |
---|---|---|
[[node-count]] | The total number of processing nodes. This place holder represents the level of parallelism for the Netezza connector stage and is equal to the number of processing nodes. | 1 |
[[node-number]] | The current processing node zero-based index. For example, if there are 4 processing nodes, the processing node indexes are 0,1,2,3. | 0 |
SELECT * FROM table WHERE mod(datasliceid,[[node-count]])=[[node-number]]
[[node-count]] is replaced with the total number
of processing nodes and [[node-number]] is replaced
with the current processing node zero-based index.