If a DataStage parallel job deployed as a provider for an Information Services Director application is configured with a multiple node configuration file, the responses could be out of order.
** Updated June 10, 2012 **
IBM has released a fix for APAR JR41223, and a utility to identify jobs which may be impacted by this issue, JR41718. Download fixes for JR41223 and JR41718 from IBM Fix Central.
In InfoSphere DataStage parallel jobs, when a multi-node configuration file is used, it is possible for data to be passed through different paths and at different speeds between the stages of the job. Because of this architecture, it is possible that the data records which make up an Information Services Director (ISD) service request could arrive at the ISD output stage in a different order than they were received. When this situation occurs, the response data produced by the job for one request can be directed to a different request. It is also possible that the response for one request could have both its own data plus the data that was meant for a different request. In summary, when a multiple node configuration file is used and multiple requests are processed simultaneously, there is no guarantee that the response provided will contain the correct data. This issue is reported as APAR JR41223.
This issue will only occur for jobs deployed as ISD providers, and which meet all of the the following criteria:
- The job has an ISD Input stage.
- The configuration file used by the job (APT_CONFIG_FILE) contains multiple nodes.
- Multiple requests can be processed by the job at the same time.
To assist you in determining if you have an exposure to this issue, IBM has created a utility which will search for DataStage jobs which meet the criteria listed previously. The results from running this utility will direct the DataStage developers to check these jobs and confirm if any of the workarounds provided below must be applied to prevent this issue.
Workaround #1 - Preferred
The preferred solution to this problem is to ensure that the job will only use a single-node configuration file. This can be done at the job level by adding the environment variable $APT_CONFIG_FILE to the job properties. $APT_CONFIG_FILE then needs to be set to reference a configuration file which contains only a single node defined. Once this change has been made, the job needs to be disabled within Information Services Director, recompiled in DataStage Designer, and then re-enabled in the ISD console.
Note: If all jobs in the project are ISD jobs, you could alternatively implement this solution by changing the $APT_CONFIG_FILE environment variable at the DataStage project level to point to a single node configuration file as default. Even when making a project level change, you must still disable and then re-enable the jobs in the ISD console (though recompilation is not necessary.) Be careful however, that some of the jobs don't override the project level default by defining a value locally for $APT_CONFIG_FILE. In that situation, you will need to modify the job as described above to ensure that it points to a single node configuration file.
Workaround # 2 - Alternative method
This method should only be used for temporary relief until the job can be disabled, and the configuration file changed. From the Deployed Information Services Application page of the ISD console, change the Request Limit property for the provider to 1. With a Request Limit of 1, it is not possible for multiple requests to be processed by a job at the same time. Making this change does not require that the application be redeployed or disabled/enabled.
|Information Management||InfoSphere DataStage|
|Information Management||InfoSphere Information Services Director|