Using stackit to gather debugging data on UNIX and Linux systems

Troubleshooting

Problem

You have a process which is hanging or using too much CPU, or which has crashed and left behind a core dump, and you need help determining the cause of the problem.

Resolving The Problem

The stackit script provides an easy way to examine running processes and core dumps on AIX, HP-UX, Linux and Solaris systems. Although stackit was written by the IBM MQ support team, it can be used to gather information about any process or core file.

Using stackit

In order to use stackit, you must first download the script to your system and make it executable, for example by running: chmod a+x stackit

stackit.sh v5.1 (56 KB)

Syntax

stackit -?

stackit [-f File] [-o Options] {-m Match | -n Name | -p Pid}...

stackit [-f File] [-o Options] -c Core -e Executable

General options

-f File

Write the stackit output to a new file, which is important if you need to send it to IBM. Without this option, stackit will display its output to the screen only.

-o Options

Control what information stackit gathers. The default options are sufficient in most cases, but IBM may ask for custom options when diagnosing certain problems:

Default: Usually stack,cred,map,ldd
All: All possible data

stack: Stack trace for all threads All platforms
cred: Security credentials All platforms
map: Library map information All but some HP-UX
smaps: Detailed address space map Linux only
ldd: Shared library dependencies All platforms
status: Process status information All but HP-UX
files: File descriptor usage All but some HP-UX
regs: Machine register contents All platforms
asm: Assembler instructions AIX only
locks: Mutexes and condition variables AIX only
thread: Detailed thread data AIX only coreinfo: Core file details AIX only coremap: Core file address space map AIX only
sig: Disposition for signals All but Linux & HP-UX
safe: Safe mode (avoids debuggers) All platforms

Process selection flags

-m Match

Analyze processes whose command line contain a matching pattern. For example, if you use "-m FRED", then stackit will capture any process with the name "FRED" in its command line arguments. Regular expressions are permitted when matching patterns.

-n Name

Analyze processes by the executable program name. For example, if you use "-n runmqlsr" then stackit will capture any process named runmqlsr.

-p Pid

Analyze a processes by its process identifier (pid). For example, if you use "-p 12345" then stackit will capture pid 12345, if it exists.

Core file flags

-c Core

The name of the core dump file stackit should analyze. If the core dump is called 'core', you should rename it first to prevent another process from overwriting it.

-e Executable

The path to the executable which created the core file. Please be sure to identify the right program, particularly if there are multiple versions on your system, or the stackit analysis will not be valid.

Usage Notes

The stackit script provides several features to make it easier to use at the command line.

Combining options

You can repeat the -o flag or provide a comma-separated list, or both, when selecting options. For example, the following commands are equivalent:

sh> stackit -f debug.txt -o stack -o locks -o asm -p 29723

sh> stackit -f debug.txt -o stack,locks -o asm -p 29723

sh> stackit -f debug.txt -o stack,locks,asm -p 29723

Repeating process selection flags

You can use the -m, -n and -p flags together, and repeat them as many times as needed, in order to select all the processes you want stackit to analyze. However, a process will be analyzed only once, regardless of how many times it was matched.

Implicit process selection flags

You can even skip the -m and -p flags when selecting processes. When stackit sees extra arguments, it treats numerical values as process identifiers and other values as patterns to match. For example, the following commands are equivalent:

sh> stackit -f ibm.txt -m FRED -p 498932 -p 997106 -m WILMA

sh> stackit -f ibm.txt -m FRED -p 498932 997106 WILMA

sh> stackit -f ibm.txt FRED 498932 997106 WILMA

Example 1

Read detailed help for stackit, customized for your system:

sh> stackit -?

Example 2

Generate default output to the file stackit.out from all processes whose command line matches the string TEST.QMGR:

sh> stackit -f stackit.out -m TEST.QMGR

Example 3

Generate stack and map data from processes with "mq" in their name to the file mqstackdata.txt

sh> stackit -f mqstackdata.txt -o stack,map -n mq

Example 4

Generate default output and registers from processes 7029, 10737 and 41824 to the file regs.txt

sh> stackit -f regs.txt -o default,regs 7029 10737 41824

Example 5

Generate stack data in safe mode to the file stack.txt from all processes with "db2" in their name, all processes with "AppSrv01" on their command line, and process 9297367:

sh> stackit -f stack.txt -o stack,safe -n db2 -m AppSrv01 -p 9297367

Example 6

Generate default data from a core file called core.11424 generated by the program /usr/local/bin/pmrouter and write it to the file coreinfo.txt:

sh> stackit -f coreinfo.txt -c core.11424 -e /usr/local/bin/pmrouter

Security

You must have authority to examine a process in order for stackit to succeed. In most cases you can analyze only the processes you started. However, some programs use the UNIX setuid/setgid bits to run as another user, and in most cases you must be root to examine such processes.

For example, IBM MQ uses the setuid/setgid bits to run its processes as the mqm user. If you were logged in as mqm when you started an MQ queue manager, the mqm user should be able to run stackit against the queue manager processes; Otherwise, you must run stackit as root.

If stackit cannot run against a process, it will tell you who can analyze it. Be sure to look for such messages before sending stackit output to IBM for review. For example:

stackit: Success rate: 0%
stackit: Stackit failed to analyze processes which belong to another user.
Run stackit as scotty to analyze those processes.

The root user can examine any process, so run stackit as root if you are having difficulty.

Debuggers

Stackit gathers information using the tools provided by your operating system. One important tool stackit calls is a debugger, which it may use to examine live processes and always uses to examine core files. The debuggers supported by stackit on each operating system are:

AIX: The dbx debugger is in the bos.adt.debug LPP which is part of AIX.
HP-UX: The Wildebeest Debugger (WDB) is an HPE supported implementation of the GNU Debugger and is available for free download from HP for Itanium systems.
Linux: The GNU Debugger (gdb) is available in the gdb RPM on many Linux distributions.
Solaris: The modular debugger (mdb) is part of the Solaris SUNWmdb package.

Stackit goes to great lengths to ensure that it does not interrupt or terminate processes when using a debugger, even if you kill or cancel stackit (e.g. by using Ctrl-C). However, there is a small chance that a fault in the debugger or in stackit itself could terminate a process. You should use stackit only when diagnosing a problem to minimize the chance of a failure.

You can also add the 'safe' option to the stackit command line, preventing stackit from using debuggers against processes on your system. Stackit may not be able to gather all the data you requested, especially on HP-UX and Linux, but the safe option ensures it cannot terminate any processes by accident. Example 5 demonstrates the use of the 'safe' option.

Sample Output

This sample output shows the kind of information stackit can gather. In this example, stackit was able to analyze one process successfully, but lacked authority to analyze another:

sh> stackit -o stack,asm runmqlsr inetd
stackit: V5.1 running on AIX 7.3 (powerpc) with arguments: -o stack,asm
runmqlsr inetd

Analyzing process 9896164 1 Mar 2024 at 16:46:10 GMT
==========================================================================

PID PPID STARTED EUSER EGROUP COMMAND
9896164 10420476 Mar 01 mqm mqm runmqlsr -m V8QM -t TCP -p 1607

Thread Stacks:

9896164: /usr/mqm/bin/runmqlsr -m V8QM -t TCP -p 1607
---------- tid# 36700257 (pthread ID: 1) ----------
0x0900000000112394 naccept(??, ??, ??) + 0xb4
0x09000000117357e0 cciTcpListenConv() + 0xd00
0x090000000818f894 ccxListenConv() + 0x274
0x00000001000004dc WaitForConnectLoop() + 0x7c
0x00000001000011a8 main() + 0x888
0x0000000100000288 __start() + 0x90
---------- tid# 38011017 (pthread ID: 772) ----------
0x0900000000154834 __fd_poll(??, ??, ??) + 0xb4
0x09000000076547a8 xcsWaitFd() + 0x9c8
0x0900000007653d84 xcsWaitSocket() + 0x44
0x0900000007fcde94 cccJobMonitor() + 0x11d4
0x0900000007601c30 ThreadMain() + 0x15d0
0x09000000004f4d30 _pthread_body(??) + 0xf0
---------- tid# 37945479 (pthread ID: 515) ----------
0x0900000000507584 _event_sleep(??, ??, ??, ??, ??, ??) + 0x5a4
0x0900000000508018 _event_wait(??, ??) + 0x2b8
0x09000000005162c4 _cond_wait_local(??, ??, ??) + 0x4e4
0x090000000051689c _cond_wait(??, ??, ??) + 0xbc
0x0900000000517508 pthread_cond_wait(??, ??) + 0x1a8
0x090000000762d278 xtmTimerThread() + 0x378
0x0900000007601c30 ThreadMain() + 0x15d0
0x09000000004f4d30 _pthread_body(??) + 0xf0
---------- tid# 37486713 (pthread ID: 258) ----------
0x0900000000507584 _event_sleep(??, ??, ??, ??, ??, ??) + 0x5a4
0x090000000050c044 _p_sigtimedwait(??, ??, ??) + 0x4a4
0x090000000762368c xehAsySignalMonitor() + 0x7ec
0x0900000007601c30 ThreadMain() + 0x15d0
0x09000000004f4d30 _pthread_body(??) + 0xf0

Assembler Instructions

0x900000000507560 (_event_sleep+0x580) beq 0x9000000005075f8
0x900000000507564 (_event_sleep+0x584) li r6,0x0
0x900000000507568 (_event_sleep+0x588) ori r5,r29,0x0
0x90000000050756c (_event_sleep+0x58c) cmpi cr4,0x1,r0,0x0
0x900000000507570 (_event_sleep+0x590) ori r0,r0,0x0
0x900000000507574 (_event_sleep+0x594) ori r0,r0,0x0
0x900000000507578 (_event_sleep+0x598) ori r0,r0,0x0
0x90000000050757c (_event_sleep+0x59c) ori r1,r1,0x0
0x900000000507580 (_event_sleep+0x5a0) bl 0x900000000508710
0x900000000507584 (_event_sleep+0x5a4) ld r2,0x28(r1)
0x900000000507588 (_event_sleep+0x5a8) sli r0,r3,0x0
0x90000000050758c (_event_sleep+0x5ac) stw r3,0x70(r1)
0x900000000507590 (_event_sleep+0x5b0) addi r4,0x78(r1)
0x900000000507594 (_event_sleep+0x5b4) ld r3,0x30(r31)
0x900000000507598 (_event_sleep+0x5b8) ori r5,r31,0x0
0x90000000050759c (_event_sleep+0x5bc) ori r6,r30,0x0
0x9000000005075a0 (_event_sleep+0x5c0) cmpi cr0,0x1,r3,0x0
0x9000000005075a4 (_event_sleep+0x5c4) addi r7,0x70(r1)
0x9000000005075a8 (_event_sleep+0x5c8) beq 0x90000000050744c
0x9000000005075ac (_event_sleep+0x5cc) bl 0x900000000507960
0x9000000005075b0 (_event_sleep+0x5d0) cmpi cr0,0x0,r3,0x0

Analyzing process 4194432 1 Mar 2024 at 16:46:10 GMT
==========================================================================

PID PPID STARTED EUSER EGROUP COMMAND
4194432 3866762 Jan 30 root system /usr/sbin/inetd

stackit: You must run stackit as root to analyze process 4194432.
stackit: Analysis failed for process 4194432.

Summary of results 1 Mar 2024 at 16:46:11 GMT
==========================================================================
stackit: Success rate: 50%
stackit: Stackit failed to analyze processes which belong to another user.
Run stackit as root to analyze those processes.

DISCLAIMER: All source code and/or binaries attached to this document are referred to here as "the Program". IBM is not providing program services of any kind for the Program. IBM is providing the Program on an "AS IS" basis without warranty of any kind. IBM WILL NOT BE LIABLE FOR ANY ACTUAL, DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES OR FOR ANY ECONOMIC CONSEQUENTIAL DAMAGES (INCLUDING LOST PROFITS OR SAVINGS), EVEN IF IBM, OR ITS RESELLER, HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"ARM Category":[{"code":"a8m0z0000001hlDAAQ","label":"Performance-\u003EHangs"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"All Versions"}]

Tips