IBM Support

How to get a stack trace for failing processes in a DataStage Parallel Job, Linux platforms

Question & Answer


Question

When a DataStage Parallel job as processes which are failing with SIGSEGV, SIGILL or other abnormal failures which would result in the process dumping core, the following instructions can be used to obtain stack traces from the core dumps.

Answer

Follow these instructions so that the failing DataStage processes will create core dumps, and then obtain the stack traces from those core files. These steps require that the gdb debugger is installed on the system. If you do not have the gdb debugger program, contact your system administrator to have it installed before completing these steps.

Part 1 - Linux Operating System Pre-requisites

You must first ensure that your system will permit core files to be created. These steps may require superuser access.

  1. Edit the /etc/security/limits.conf file, and ensure that the hard and soft limits for core files are set to unlimted for the root user. This is necessary because the DataStage processes are forked from the dsrpcd program and they inherit its environment including its limits.

  2. Edit the /etc/sysctl.conf file, and add the following line:

    kernel.core_uses_pid = 1

    Save the file, and reload this setting into the active kernel by issuing the command:

    sysctl -p

    Making this change allows the multiple DataStage processes to create core files without stepping on each other.

  3. If you made changes to the limits file, you will need to log completely out of the login session (ssh / telnet, etc) and reconnect. Once reconnected, stop and restart the dsrpcd program.

Part 2 - DataStage Pre-requisites

Complete these steps before running the job. If you are able to modify the individual job, follow these steps exactly. If you cannot modify the job (i.e. read-only production environment) you can use the DS Administrator to set these variables at the project level. If you choose to modify them at the project level you will require DataStage Administrator access to make the changes, and these resulting settings will take effect for ALL jobs in the project.


  1. In the Job Properties > Environment Variables section, check for the existence of the variable APT_NO_PM_SIGNAL_HANDLERS. If it does not already exist there, create it.. Set the value to '1' (without quotes.) This will tell the DataStage process not to trap the signals and allow a core dump to be created.

  2. In the Job Properties > Environment Variables section, select the following environment variables from the 'Reporting' section and set them both to TRUE for this job: APT_DUMP_SCORE, and APT_PM_SHOW_PIDS

  3. Save and recompile the job.


Part 3 - Run the job and collect information

  1. Run the job, and examine the log. Ensure that the job failed as you expected. Using the DS Director, ensure that the detailed job log view DOES NOT have any filtering enabled. Print the *complete* job log to a file, using the both the "All Entries" and "Full Details" options.

  2. Check the DataStage project directory and confirm that at least one core files were created. The file names will match the following naming pattern: 'core.pid' where pid is the process id.

  3. Log into the operating system as the user who was used to run the job. For each core file present, repeat the next two steps to obtain a stack trace using the gdb debugger.

  4. Issue the following command. You will need to adjust the path to the osh executable based on the location where you have installed it in your system, and the actual name of each core file. The default path to osh is shown here.

    gdb /opt/IBM/InformationServer/Server/PXEngine/bin/osh core.pid

  5. Inside the debugger, type the command 'bt' (without quotes) to get the stack trace. Capture this output from the terminal window, along with the core file name corresponding to this stack trace. Then use the command 'quit' (without quotes) to exit the debugger.

Provide the job log and the stack traces you obtained, clearly labeled with the core file name for each stack trace, to IBM support for problem resolution.

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"9.1;8.7;8.5;8.1;8.0.1;7.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21461167