The configuration file

The configuration file describes available processing power in terms of processing nodes. The number of nodes you define in the configuration file determines how many instances of a process will be produced when you compile a parallel job.

One of the great strengths of InfoSphere® DataStage® is that, when designing jobs, you don’t have to worry too much about the underlying structure of your system, beyond appreciating its parallel processing capabilities. If your system changes, is upgraded or improved, or if you develop a job on one platform and implement it on another, you don’t necessarily have to change your job design.

InfoSphere DataStage learns about the shape and size of the system from the configuration file. It organizes the resources needed for a job according to what is defined in the configuration file. When your system changes, you change the file not the jobs.

The configuration file describes available processing power in terms of processing nodes. These might, or might not, correspond to the actual number of processors in your system. You might, for example, want to always leave a couple of processors free to deal with other activities on your system. The number of nodes you define in the configuration file determines how many instances of a process will be produced when you compile a parallel job.

Every MPP, cluster, or SMP environment has characteristics that define the system overall as well as the individual processors. These characteristics include node names, disk storage locations, and other distinguishing attributes. For example, certain processors might have a direct connection to a mainframe for performing high-speed data transfers, while others have access to a tape drive, and still others are dedicated to running an RDBMS application. You can use the configuration file to set up node pools and resource pools. A pool defines a group of related nodes or resources, and when you design a parallel job you can specify that execution be confined to a particular pool.

The configuration file describes every processing node that InfoSphere DataStage will use to run your application. When you run a parallel job, InfoSphere DataStage first reads the configuration file to determine the available system resources.

When you modify your system by adding or removing processing nodes or by reconfiguring nodes, you do not need to alter or even recompile your parallel job. Just edit the configuration file.

The configuration file also gives you control over parallelization of your job during the development cycle. For example, by editing the configuration file, you can first run your job on a single processing node, then on two nodes, then four, then eight, and so on. The configuration file lets you measure system performance and scalability without actually modifying your job.

You can define and edit the configuration file using the Designer client.