IBM Support

How does Datastage Parallel Engine use scratch disk to buffer data?

Question & Answer


Question

How does Datastage Parallel Engine use scratch disk to buffer data?

Cause

The configuration file can specify how to buffer data in multiple ways using nodes and pools.

Answer

Under certain circumstances, the parallel engine uses both memory and disk storage to buffer virtual data set records. The amount of memory defaults to 3 MB per buffer per processing node. The amount of disk space for each processing node defaults to the amount of available disk space specified in the default scratch disk setting for the node.

The first "resource scratch" directory on each logical node is selected as the first disk used. Each scratch disk in the list will be used to volume capacity and then the next will be used. If all scratch disks defined on each node are consumed, $TMPDIR is used as the scratch disk. If that is used, /tmp is selected.


The parallel engine uses the default scratch disk for temporary storage other than buffering. If you define a buffer scratch disk pool for a node in the configuration file, the parallel engine uses that scratch disk pool rather than the default scratch disk for buffering, and all other scratch disk pools defined are used for temporary storage other than buffering.

Here is an example configuration file that defines a buffer scratch disk pool:

{
node node1 {
fastname "node1_css"
pools "" "node1" "node1_css"
resource disk "/orch/s0" {}
resource scratchdisk "/scratch0" {pools "buffer"}
resource scratchdisk "/scratch1" {}
}
node node2 {
fastname "node2_css"
pools "" "node2" "node2_css"
resource disk "/orch/s0" {}
resource scratchdisk "/scratch0" {pools "buffer"}
resource scratchdisk "/scratch1" {}
}
}

In this example, each processing node has a single scratch disk resource in the buffer pool, so buffering will use /scratch0 but not /scratch1. However, if /scratch0 were not in the buffer pool, both /scratch0 and /scratch1 would be used because both would then be in the default pool. Scratch0 would be used first and then scratch1

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.1;8.5;8.7;9.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21651831