IBM Support

InfoSphere DataStage and QualityStage, Version 9.1 Job Compatibility

Technote (troubleshooting)


Problem(Abstract)

IBM InfoSphere DataStage and QualityStage, Version 9.1 introduces new features and functions that affect existing Version 8.7 jobs.

When you upgrade or migrate from InfoSphere DataStage or InfoSphere QualityStage, Version 8.7 to InfoSphere DataStage or InfoSphere QualityStage, Version 9.1, you might need to make changes to your parallel jobs before you can run them successfully in Version 9.1. No changes are required for server jobs.

Resolving the problem

This topic has the following sections:

Changes to Parallel Jobs

Below are the environment variables that you might need to update in order for your parallel jobs from 8.7 to run correctly after you upgrade to 9.1.


Description Environment Variable
In Version 8.7 and earlier, an invalid day of any given date is changed to the maximum valid day, plus an extra remaining day. For example, if given January 40, 2011, the calculation is January 31, 2011 + 9 days, which results in February 9, 2011. This is applied only to situations where year/month date math is calculated, and does not apply to day calculations. In Version 9.1, a current invalid day of any given date is adjusted to the maximum valid day if the day value is greater than the maximum valid day. To restore, Version 8.7 functionality, set the following environment variable:

APT_DATE_ADD_ROLLOVER

For example, in Version 9.1, if you add one month to Jan 31, 2011, the result is Feb 28, 2011. If you update the environment variable, the result is March 3, 2011.
In Version 8.7, when the APT_TRANSLITERATE_CASE_USES_LATIN1 environment variable is defined, the LATIN-1 encoding is used for upper and lower-case conversions for string columns. The environment variable 'APT_TRANSLITERATE_CASE_USES_NLS_MAP' that was provided in earlier releases is ignored, because it corresponds to the default behavior of the upper-case and lower-case functions. In Version 9.1, by default, a job's NLS map is used for upper and lower-case conversions for string columns.

To restore, Version 8.7 functionality, set the following environment variable:

APT_TRANSLITERATE_CASE_USES_LATIN1
In Version 9.1, when the environment variable APT_NO_OP_PSOUTPUT is set, it retains the Version 8.7 behavior of process listing in the Unix ps command (ps) output.

The output of the ps command for parallel process osh commands in Version 9.1 has been changed to append operator names to the information.

The output in Version 9.1 is similar to the following:
root 27770 27762 0 02:45 ? 00:00:00 /opt/IBM/InformationServer/Server/PXEngine/bin/osh -APT_PMsectionLeaderFlag ipshyd46.in.ibm.com 10001 0 30 node1 ipshyd46.in.ibm.com 1335422727.547852.6c51 0 sequential ISD_Output_1

The output in Version 8.7 was similar to the following:
root 29277 29274 0 Apr24 ? 00:00:00 /opt/IBM/InformationServer/Server/PXEngine/bin/osh -APT_PMsectionLeaderFlag ipshyd46.in.ibm.com 10002 0 30 node1 ipshyd46.in.ibm.com 1335256554.704362.7243 0

For example, ps -ef | grep osh generates output without appended operator names. When this environment variable is NOT set, the operator name gets appended to the associated process in the ps command output. This environment variable is not supported on Windows platforms because ps output is usually truncated on the Windows platform. It works on Sun Microsystems platforms only if you use a GNU version of the ps command.
Set the following environment variable if you want to generate ps command output without appended operator names:
APT_NO_OP_PSOUTPUT
The default transport type for the I/O communication layer is shared memory, which uses a mapped memory region and a named pipe (fifo). On Microsoft Windows, sockets can be more efficient than shared memory, and are therefore set as the default transport type. You can restore the use of shared memory as the default transport type on Windows by specifying any value for this variable. This variable applies only to Microsoft Windows platforms. To restore, Version 8.7 functionality, set the following environment variable:

APT_NO_IOCOMM_OPTIMIZATION
In versions 8.7 and earlier, the upper-case and lower-case conversion APT_String used a single character conversion routine, which can produce incorrect results for a small number of characters in certain locales. It also does not allow the string to expand during upper-case and lower-case conversions, which can occur for some characters.
In Version 9.1, an entire string conversion algorithm is used. The algorithm produces correct results for all characters and allows for string expansion.

To restore, Version 8.7 functionality, set the following environment variable:

APT_SINGLE_CHAR_CASE

Setting this environment variable reverts to the old single character algorithm.
Starting in Version 9.1, setting the APT_IO_MAXIMUM_OUTSTANDING environment variable enables flow control unless APT_IO_NO_FLOW_CONTROL is set. Set the environment variable to the following if you want to enable flow control:

APT_IO_MAXIMUM_OUTSTANDING
In Version 9.1, use the APT_IO_NO_FLOW_CONTROL environment variable to specify that flow control will not be used in the job. In Version 9.1, flow control is disabled by default, so there is no need to set it. This environment variable takes precedence over: -APT_IO_FORCE_FLOW_CONTROL,
-APT_SENDBUFSIZE, APT_RECVBUFSIZE, -APT_IO_CHECK_SEND_SIDE -APT_IO_MAXIMUM_OUTSTANDING.
In Version 9.1, use the APT_IO_FORCE_FLOW_CONTROL variable to specify that flow control is always used in a job. In Version 9.1, set this variable to enable flow control:

APT_IO_FORCE_FLOW_CONTROL

Flow control is disabled by default.
Starting in Version 9.1, setting this will enable flow control unless APT_IO_NO_FLOW_CONTROL is set. In Version 9.1, set this variable to specify that flow control checks the send side parameters, as well as the receiving side parameters:

APT_IO_CHECK_SEND_SIDE
Starting in Version 9.1, setting the APT_SENDBUFSIZE and APT_RECVBUFSIZE environment variables enables flow control unless APT_IO_NO_FLOW_CONTROL is set.


You must explicitly set the flow control window by specifying one of the following environment variables:

APT_SENDBUFSIZE
APT_RECVBUFSIZE

APT_RECVBUFSIZE takes precedence. If not set, APT_IO_MAXIMUM_OUTSTANDING is used to compute flow control window.
Starting in Version 9.1, specify the environment variable APT_ONE_NODE_CONSTRAINT to determines whether or not one-node constraint are applied to the processing operators of a data flow. A processing operator has at least one input link and one output link. When set to true, one-node constraint is automatically applied. When set to false, one-node constraint is not applied. For InfoSphere Information Services Director jobs, this environment variable is set to true by default if APT_CONFIG_FILE contains more than one-node. You can set the variable to false to disable one-node constraint. For non-EOW applications and EOW applications other than InfoSphere Information Services Director, this environment variable is set to false by default. You can set it to true to enable one-node constraint. Set the following environment variable to specify if one-node constraint is to be applied to the processing operators of a data flow.

APT_ONE_NODE_CONSTRAINT
In Version 9.1, the APT_NO_WAIT_EOW environment variable applies only to EOW applications. The variable determines how EOW markers are collected when being propagated through multiple partitions. By default, if a communication channel encounters an EOW, the framework puts it on hold, then continues to read from other communication channels, until all communication channels have encountered EOW markers. Setting the environment variable disables this behavior, allowing the framework to read from the communication channels that have already
reached EOW without waiting on other channels to reach EOW first.
If you want to apply a one-node constraint for EOW applications, these two environment variables need to be set as follows:

APT_NO_WAIT_EOW=1
APT_ONE_NODE_CONSTRAINT=true

If APT_ONE_NODE_CONSTRAINT=true set but APT_NO_WAIT_EOW is not set, APT_ONE_NODE_CONSTRAINT is ignored.
In Version 9.1, the APT_WAIT_EOW_TIMEOUT
environment variable only applies to EOW applications.
The APT_WAIT_EOW_TIMEOUT environment variable controls how long the framework waits before terminating the job if not all EOW markers have arrived within the give time period. The default time out value is 120 seconds. It can be set to any positive integers representing the number of seconds. If a negative value is specified, the default value is used.
In Version 8.7, when a player failed, it still sent end of data (EOD) to downstream.
In Version 9.1, when a player fails, it does not send EOD downstream because the downstream player receives end of file (EOF) markers, indicating that a job finished normally.

If the downstream player is a distributed transaction stage (DTS) which relies on an EOF marker to commit records to database tables, sending EOD when a job is failing can cause a partial commit. By default, EOD is not sent if a job is failing.

To restore, Version 8.7 functionality, set the following environment variable:

APT_FORCE_SEND_EOD

Changes to Server Jobs

No changes are required for server jobs.

Updates to Data Sets

Data sets that were created in Version 8.7 are not compatible in Version 9.1. If you have data sets that persist between jobs, you must apply a patch for Version 9.1 jobs before the 8.7 data sets can be read as part of the jobs Obtain and apply the patch for APAR JR46023.

InfoSphere QualityStage Updates, Enhancements, and Fixes

Update: Effective with Version 9.1 of InfoSphere Information Server, the InfoSphere QualityStage Legacy Stage is no longer supported. Before upgrading or migrating to Version 9.1, you should rewrite any jobs using the QualityStage Legacy stage. Remove all jobs using the QualityStage Legacy stages which were created in earlier versions of InfoSphere DataStage and QualityStage before upgrading or before exporting from an earlier version for import into version 9.1.

Enhancement: There are new delivered rule sets to process Russian name and address data.
Rule sets affected: RUADDRL, RUNAMEL

Enhancement: There is a new delivered rule set to process pharmaceutical data.
Rule sets affected: PHPROD

Fix: The format of CNPHONE was resolved to convert the input to narrow characters and mobile phone handling was added.
Rule set affected: CNPHONE
Potential change to output: YES

Fix: The format of HKPHONE was modified to convert the input to narrow characters and general processing was added to eliminate the delimiter requirement.
Rule set affected: HKPHONE
Potential change to output: YES

Manual Changes Required after Upgrading from Version 8.7 to Version 9.1

On a Windows platform, if you use Visual Studio 2008, then you need to do the following:
Copy PXEngine/etc/osh.exe.manifest-vs2008 PXEngine/etc/osh.exe.manifest

Updates that Require Job Recompilation

On Windows platforms, parallel jobs with transformer, lookup, or slow changing dimension require recompilation or the jobs will fail to run.

Recompiling after Migration

If you use the InfoSphere Information Server migration tool to migrate from a prior version of InfoSphere Information Server, then all jobs must be recompiled.

Cross reference information
Segment Product Component Platform Version Edition
Information Management InfoSphere Information Server 9.1
Information Management InfoSphere QualityStage 9.1

Document information

More support for: InfoSphere DataStage

Software version: 9.1

Operating system(s): AIX, HP-UX, Linux, Solaris, Windows

Reference #: 1627562

Modified date: 20 February 2017