IBM Support

A new feature to generate stack traces for Parallel jobs starting at version 9.1 of DataStage

Question & Answer


Question

How can I generate stack traces for Parallel jobs at DataStage starting at version 9.1?

Answer

There is a new facility to generate stack traces and capture other valuable information for parallel jobs at version 9.1 of DataStage.

The following user defined environment variables are used to control this feature:
  • APT_DUMP_STACK - Setting this to 1 will enable basic stack trace dump.
    APT_DUMP_STACK_DIRECTORY - When set to a valid path the dump files will be created in the specified directory; if undefined or not set to a valid path then the dump files will be created in $TMPDIR if TMPDIR defined, otherwise in /tmp on Unix/Linux, and %TEMP% on Windows.

    Please note that the specified directory needs to exist on all systems if the parallel engine is used in an MPP or cluster configuration.

After setting APT_DUMP_STACK the feature is automatically invoked by the parallel framework when an unrecoverable exception occurs such as a segmentation fault. Not all errors will generate a signal that will cause a stack trace.

Note: This applies to parallel jobs only. Not applicable for server or sequence jobs.

If the job is successful a dump will not be created therefore you can leave this set to capture a dump for an intermittent issue.

The files created will be named: px_engine_dump_YYYY_MM_DD_HH_MM_SS_PID

For example: px_engine_dump_2013_06_07_16_07_16_3228

This is available on Unix/Linux and Windows. It will provide information that was not previously available on Windows since there is no core file on Windows to get a stack trace and on Unix/Linux it doesn't rely on a debugger being installed.

To use on demand when signaled via SIGABRT for a job that is deadlocked or hung, set the following additional environment variables:
  • APT_DUMP_STACK_PERIOD=0
    APT_PM_SHOW_PIDS
    =True
    APT_DUMP_SCORE=True

    Note: APT_DUMP_STACK_PERIOD needs to be defined as a user defined environment variable.
    • When APT_DUMP_STACK_PERIOD is set along with APT_DUMP_STACK it allows us to get a trace by sending a SIGABRT to a process without aborting the process/job.
    • If APT_DUMP_STACK is not enabled then the handler will generate a stack trace and abort the process/job.
    • When the job encounters a deadlock/hang you need to identify the process that is hung and send a SIGABRT. Having the pids and the dump score in the job log can help with this.
    • Once you have identified the process send the SIGABRT using:
      kill -s sigabrt <pid>

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.1;11.5;11.3","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 August 2023

UID

swg21639558