Remote deployment

When parallel jobs are deployed remotely, scripts for the jobs can be stored and run on a separate computer from the engine tier host computer.

The remote deployment option can, for example, be used to run jobs in a grid computing environment.

Any remote computer that has a deployed job must have access to the parallel engine to run the job. Such computers must also have the correct runtime libraries for that platform type.

Because these jobs are not run on the InfoSphere® DataStage® engine tier, engine tier components (such as BASIC Transformer stages, server shared containers, before and after subroutines, and job control routines) cannot be used. Also, a limited set of plug-in stages is available for use in these jobs.

When you run the jobs, the logging, monitoring, and operational metadata collection facilities that are provided by InfoSphere DataStage are not available. The output of deployed jobs includes logging information in internal parallel engine format, but you must collect the logging information.

To prepare a parallel job for deployment, you use the InfoSphere DataStage Designer client to develop the job, and then you compile the job. Such jobs can also be run under the control of the InfoSphere Information Server engine (by using the Designer or Director clients, or by using the dsjob command). When you use any of these methods to run the jobs, the executable files in the project directory are used, not the deployment scripts.

Before you can run a deployed job on the remote computer, you must define a configuration file on the remote computer, transfer the deployment package to the remote computer, and complete other configuration steps on the remote computer.

The following diagram shows a conceptual view of an example deployment system. In this example, deployable jobs are transferred to three conductor node computers. The engine tier host computer and the conductor node computers run the same operating system. Each conductor node has a configuration file that describes the resources that are available for running the jobs. The jobs then run under the control of that conductor node computer.

Shows an example deployment system