Using stackit to gather debugging data on UNIX and Linux systems

Technote (troubleshooting)


Problem(Abstract)

You have a process which is hanging or using too much CPU, or which has crashed and left behind a core dump, and you need help determining the cause of the problem.

Resolving the problem

The stackit script provides an easy way to examine running processes and core dumps on AIX, HP-UX, Linux and Solaris systems. Although stackit was written by the WebSphere MQ support team, it can be used to gather information about any process or core file.



Using stackit

In order to use stackit, you must first download the script to your system and make it executable, for example by running: chmod a+x stackit


    Syntax


    stackit -?

    stackit [-f File] [-o Options] {-m Match | -n Name | -p Pid}...

    stackit [-f File] [-o Options] -c Core -e Executable



    General options

    -f File

      Write the stackit output to a new file, which is important if you need to send it to IBM. Without this option, stackit will display its output to the screen only.

    -o Options
      Control what information stackit gathers. The default options are sufficient in most cases, but IBM may ask for custom options when diagnosing certain problems:


       Default : Usually stack,cred,map,ldd
           All : All possible data

         stack : Stack trace for all threads        All platforms
          cred : Security credentials               All platforms
           map : Address space map                  All but some HP-UX
           ldd : Shared library dependencies        All platforms
        status : Process status information         All but HP-UX
         files : File descriptor usage              All but some HP-UX
          regs : Machine register contents          All platforms
           asm : Assembler instructions             AIX only
         locks : Mutexes and condition variables    AIX only
        thread : Detailed thread data               AIX only
           sig : Disposition for signals            All but Linux & HP-UX
          safe : Safe mode (avoids debuggers)       All platforms

    Process selection flags

    -m Match

      Analyze processes whose command line contain a matching pattern. For example, if you use "-m FRED", then stackit will capture any process with the name "FRED" in its command line arguments. Regular expressions are permitted when matching patterns.

    -n Name
      Analyze processes by the executable program name. For example, if you use "-n runmqlsr" then stackit will capture any process named runmqlsr.

    -p Pid
      Analyze a processes by its process identifier (pid). For example, if you use "-p 12345" then stackit will capture pid 12345, if it exists.


    Core file flags

    -c Core

      The name of the core dump file stackit should analyze. If the core dump is called 'core', you should rename it first to prevent another process from overwriting it.

    -e Executable
      The path to the executable which created the core file. Please be sure to identify the right program, particularly if there are multiple versions on your system, or the stackit analysis will not be valid.



Usage Notes

The stackit script provides several features to make it easier to use at the command line.


    Combining options

    You can repeat the -o flag or provide a comma-separated list, or both, when selecting options. For example, the following commands are equivalent:


    sh> stackit -f debug.txt -o stack -o locks -o asm -p 29723

    sh> stackit -f debug.txt -o stack,locks -o asm -p 29723

    sh> stackit -f debug.txt -o stack,locks,asm -p 29723


    Repeating process selection flags

    You can use the -m, -n and -p flags together, and repeat them as many times as needed, in order to select all the processes you want stackit to analyze. However, a process will be analyzed only once, regardless of how many times it was matched.



    Implicit process selection flags

    You can even skip the -m and -p flags when selecting processes. When stackit sees extra arguments, it treats numerical values as process identifiers and other values as patterns to match. For example, the following commands are equivalent:


    sh> stackit -f ibm.txt -m FRED -p 498932 -p 997106 -m WILMA

    sh> stackit -f ibm.txt -m FRED -p 498932 997106 WILMA

    sh> stackit -f ibm.txt FRED 498932 997106 WILMA



    Examples


      Example 1

      Read detailed help for stackit, customized for your system:

      sh> stackit -?


      Example 2

      Generate default output to the file stackit.out from all processes whose command line matches the string TEST.QMGR:

      sh> stackit -f stackit.out -m TEST.QMGR


      Example 3

      Generate stack and map data from processes with "mq" in their name to the file mqstackdata.txt

      sh> stackit -f mqstackdata.txt -o stack,map -n mq


      Example 4

      Generate default output and registers from processes 7029, 10737 and 41824 to the file regs.txt

      sh> stackit -f regs.txt -o default,regs 7029 10737 41824


      Example 5

      Generate stack data in safe mode to the file stack.txt from all processes with "db2" in their name, all processes with "AppSrv01" on their command line, and process 9297367:

      sh> stackit -f stack.txt -o stack,safe -n db2 -m AppSrv01 -p 9297367


      Example 6

      Generate default data from a core file called core.11424 generated by the program /usr/local/bin/pmrouter and write it to the file coreinfo.txt:

      sh> stackit -f coreinfo.txt -c core.11424 -e /usr/local/bin/pmrouter




    Security

    You must have authority to examine a process in order for stackit to succeed. In most cases you can analyze only the processes you started. However, some programs use the UNIX setuid/setgid bits to run as another user, and in most cases you must be root to examine such processes.


    For example, WebSphere MQ uses the setuid/setgid bits to run as the mqm user. If you were logged in as mqm when you started WebSphere MQ, then the mqm user should be able to run stackit against the queue manager processes; Otherwise, you must run stackit as root.


    If stackit cannot run against a process, it will tell you who can analyze it. Be sure to look for such messages before sending stackit output to IBM for review. For example:

      stackit: Success rate: 0%
      stackit: Stackit failed to analyze processes which belong to another user.
               Run stackit as scotty to analyze those processes.


    The root user can examine any process, so run stackit as root if you are having difficulty.



    Debuggers

    Stackit gathers information using the tools provided by your operating system. One important tool stackit calls is a debugger, which it may use to examine live processes and always uses to examine core files. The debuggers supported by stackit on each operating system are:

      AIX: The dbx debugger is in the bos.adt.debug LPP which is part of AIX.

      HP-UX: The Wildebeest Debugger (WDB) is an HP supported implementation of the GNU Debugger and is available for free download from HP for PA-RISC and Itanium systems.

      Linux: The GNU Debugger (gdb) is available in the gdb RPM on many Linux distributions.

      Solaris: The modular debugger (mdb) is part of the Solaris SUNWmdb package.


    Stackit goes to great lengths to ensure that it does not interrupt or terminate processes when using a debugger, even if you kill or cancel stackit (e.g. by using Ctrl-C). However, there is a small chance that a fault in the debugger or in stackit itself could terminate a process. You should use stackit only when diagnosing a problem to minimize the chance of a failure.

    You can also add the 'safe' option to the stackit command line, preventing stackit from using debuggers against processes on your system. Stackit may not be able to gather all the data you requested, especially on HP-UX and Linux, but the safe option ensures it cannot terminate any processes by accident. Example 5 demonstrates the use of the 'safe' option.



    Sample Output

    This sample output shows the kind of information stackit can gather. In this example, stackit was able to analyze one process successfully, but lacked authority to analyze another:

      sh> stackit -o stack,asm runmqlsr inetd
      stackit: V4.4 running on AIX 7.1 (powerpc) with arguments: -o stack,asm
               runmqlsr inetd
               

      Analyzing process 9896164                  1 December 2014 at 16:46:10 GMT
      ==========================================================================

           PID     PPID  STARTED  EUSER  EGROUP  COMMAND
       9896164 10420476   May 08    mqm     mqm  runmqlsr -m V8QM -t TCP -p 1607


        Thread Stacks:

          9896164: /usr/mqm/bin/runmqlsr -m V8QM -t TCP -p 1607
          ---------- tid# 36700257 (pthread ID:      1) ----------
          0x0900000000112394  naccept(??, ??, ??) + 0xb4
          0x09000000117357e0  cciTcpListenConv() + 0xd00
          0x090000000818f894  ccxListenConv() + 0x274
          0x00000001000004dc  WaitForConnectLoop() + 0x7c
          0x00000001000011a8  main() + 0x888
          0x0000000100000288  __start() + 0x90
          ---------- tid# 38011017 (pthread ID:    772) ----------
          0x0900000000154834  __fd_poll(??, ??, ??) + 0xb4
          0x09000000076547a8  xcsWaitFd() + 0x9c8
          0x0900000007653d84  xcsWaitSocket() + 0x44
          0x0900000007fcde94  cccJobMonitor() + 0x11d4
          0x0900000007601c30  ThreadMain() + 0x15d0
          0x09000000004f4d30  _pthread_body(??) + 0xf0
          ---------- tid# 37945479 (pthread ID:    515) ----------
          0x0900000000507584  _event_sleep(??, ??, ??, ??, ??, ??) + 0x5a4
          0x0900000000508018  _event_wait(??, ??) + 0x2b8
          0x09000000005162c4  _cond_wait_local(??, ??, ??) + 0x4e4
          0x090000000051689c  _cond_wait(??, ??, ??) + 0xbc
          0x0900000000517508  pthread_cond_wait(??, ??) + 0x1a8
          0x090000000762d278  xtmTimerThread() + 0x378
          0x0900000007601c30  ThreadMain() + 0x15d0
          0x09000000004f4d30  _pthread_body(??) + 0xf0
          ---------- tid# 37486713 (pthread ID:    258) ----------
          0x0900000000507584  _event_sleep(??, ??, ??, ??, ??, ??) + 0x5a4
          0x090000000050c044  _p_sigtimedwait(??, ??, ??) + 0x4a4
          0x090000000762368c  xehAsySignalMonitor() + 0x7ec
          0x0900000007601c30  ThreadMain() + 0x15d0
          0x09000000004f4d30  _pthread_body(??) + 0xf0


        Assembler Instructions

          0x900000000507560 (_event_sleep+0x580)  beq  0x9000000005075f8
          0x900000000507564 (_event_sleep+0x584)   li  r6,0x0
          0x900000000507568 (_event_sleep+0x588)  ori  r5,r29,0x0
          0x90000000050756c (_event_sleep+0x58c) cmpi  cr4,0x1,r0,0x0
          0x900000000507570 (_event_sleep+0x590)  ori  r0,r0,0x0
          0x900000000507574 (_event_sleep+0x594)  ori  r0,r0,0x0
          0x900000000507578 (_event_sleep+0x598)  ori  r0,r0,0x0
          0x90000000050757c (_event_sleep+0x59c)  ori  r1,r1,0x0
          0x900000000507580 (_event_sleep+0x5a0)   bl  0x900000000508710
          0x900000000507584 (_event_sleep+0x5a4)   ld  r2,0x28(r1)
          0x900000000507588 (_event_sleep+0x5a8)  sli  r0,r3,0x0
          0x90000000050758c (_event_sleep+0x5ac)  stw  r3,0x70(r1)
          0x900000000507590 (_event_sleep+0x5b0) addi  r4,0x78(r1)
          0x900000000507594 (_event_sleep+0x5b4)   ld  r3,0x30(r31)
          0x900000000507598 (_event_sleep+0x5b8)  ori  r5,r31,0x0
          0x90000000050759c (_event_sleep+0x5bc)  ori  r6,r30,0x0
          0x9000000005075a0 (_event_sleep+0x5c0) cmpi  cr0,0x1,r3,0x0
          0x9000000005075a4 (_event_sleep+0x5c4) addi  r7,0x70(r1)
          0x9000000005075a8 (_event_sleep+0x5c8)  beq  0x90000000050744c
          0x9000000005075ac (_event_sleep+0x5cc)   bl  0x900000000507960
          0x9000000005075b0 (_event_sleep+0x5d0) cmpi  cr0,0x0,r3,0x0

      Analyzing process 4194432                  1 December 2014 at 16:46:10 GMT
      ==========================================================================

           PID     PPID  STARTED  EUSER  EGROUP  COMMAND
       4194432  3866762   Apr 30   root  system  /usr/sbin/inetd


      stackit: You must run stackit as root to analyze process 4194432.
      stackit: Analysis failed for process 4194432.




      Summary of results                         1 December 2014 at 16:46:11 GMT
      ==========================================================================
      stackit: Success rate: 50%
      stackit: Stackit failed to analyze processes which belong to another user.
               Run stackit as root to analyze those processes.

    DISCLAIMER: All source code and/or binaries attached to this document are referred to here as "the Program". IBM is not providing program services of any kind for the Program. IBM is providing the Program on an "AS IS" basis without warranty of any kind. IBM WILL NOT BE LIABLE FOR ANY ACTUAL, DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES OR FOR ANY ECONOMIC CONSEQUENTIAL DAMAGES (INCLUDING LOST PROFITS OR SAVINGS), EVEN IF IBM, OR ITS RESELLER, HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

    Product Alias/Synonym

    WMQ MQ

    Rate this page:

    (0 users)Average rating

    Document information


    More support for:

    WebSphere MQ
    Problem Determination

    Software version:

    5.3, 6.0, 7.0, 7.1, 7.5, 8.0

    Operating system(s):

    AIX, HP-UX, Linux, Solaris

    Software edition:

    All Editions

    Reference #:

    1179404

    Modified date:

    2014-12-02

    Translate my page

    Machine Translation

    Content navigation