Using stackit to gather debugging data on UNIX and Linux systems

Technote (troubleshooting)


Problem(Abstract)

You have a process which is hanging or using too much CPU, or which has crashed and left behind a core dump, and you need help determining the cause of the problem.

Resolving the problem

The stackit script provides an easy way to examine running processes and core dumps on AIX, HP-UX, Linux and Solaris systems. Although stackit was written by the WebSphere MQ support team, it can be used to gather information about any process or core file.



Using stackit


In order to use stackit, you must first download the script to your system and make it executable, for example by running: chmod a+x stackit


    Syntax


    stackit -?

    stackit [-f File] [-o Options] {-m Match | -n Name | -p Pid}...

    stackit [-f File] [-o Options] -c Core -e Executable



    General options

    -f File

      Write the stackit output to a new file, which is important if you need to send it to IBM. Without this option, stackit will display its output to the screen only.

    -o Options
      Control what information stackit gathers. The default options are sufficient in most cases, but IBM may ask for custom options when diagnosing certain problems:


       Default : Usually stack,cred,map,ldd
           All : All possible data

         stack : Stack trace for all threads        All platforms
          cred : Security credentials               All platforms
           map : Address space map                  All but some HP-UX
           ldd : Shared library dependencies        All platforms
        status : Process status information         All but HP-UX
         files : File descriptor usage              All but some HP-UX
          regs : Machine register contents          All platforms
           asm : Assembler instructions             AIX only
         locks : Mutexes and condition variables    AIX only
        thread : Detailed thread data               AIX only
           sig : Disposition for signals            All but Linux & HP-UX
          safe : Safe mode (avoids debuggers)       All platforms

    Process selection flags

    -m Match

      Analyze processes whose command line contain a matching pattern. For example, if you use "-m FRED", then stackit will capture any process with the name "FRED" in its command line arguments. Regular expressions are permitted when matching patterns.

    -n Name
      Analyze processes by the executable program name. For example, if you use "-n runmqlsr" then stackit will capture any process named runmqlsr.

    -p Pid
      Analyze a processes by its process identifier (pid). For example, if you use "-p 12345" then stackit will capture pid 12345, if it exists.


    Core file flags

    -c Core

      The name of the core dump file stackit should analyze. If the core dump is called 'core', you should rename it first to prevent another process from overwriting it.

    -e Executable
      The path to the executable which created the core file. Please be sure to identify the right program, particularly if there are multiple versions on your system, or the stackit analysis will not be valid.



Usage Notes

The stackit script provides several features to make it easier to use at the command line.


    Combining options

    You can repeat the -o flag or provide a comma-separated list, or both, when selecting options. For example, the following commands are equivalent:


    sh> stackit -f debug.txt -o stack -o locks -o asm -p 29723

    sh> stackit -f debug.txt -o stack,locks -o asm -p 29723

    sh> stackit -f debug.txt -o stack,locks,asm -p 29723


    Repeating process selection flags

    You can use the -m, -n and -p flags together, and repeat them as many times as needed, in order to select all the processes you want stackit to analyze. However, a process will be analyzed only once, regardless of how many times it was matched.



    Implicit process selection flags

    You can even skip the -m and -p flags when selecting processes. When stackit sees extra arguments, it treats numerical values as process identifiers and other values as patterns to match. For example, the following commands are equivalent:


    sh> stackit -f ibm.txt -m FRED -p 498932 -p 997106 -m WILMA

    sh> stackit -f ibm.txt -m FRED -p 498932 997106 WILMA

    sh> stackit -f ibm.txt FRED 498932 997106 WILMA



    Examples


      Example 1

      Read detailed help for stackit, customized for your system:

      sh> stackit -?


      Example 2

      Generate default output to the file stackit.out from all processes whose command line matches the string TEST.QMGR:

      sh> stackit -f stackit.out -m TEST.QMGR


      Example 3

      Generate stack and map data from processes with "mq" in their name to the file mqstackdata.txt

      sh> stackit -f mqstackdata.txt -o stack,map -n mq


      Example 4

      Generate default output and registers from processes 7029, 10737 and 41824 to the file regs.txt

      sh> stackit -f regs.txt -o default,regs 7029 10737 41824


      Example 5

      Generate stack data in safe mode to the file stack.txt from all processes with "db2" in their name, all processes with "AppSrv01" on their command line, and process 9297367:

      sh> stackit -f stack.txt -o stack,safe -n db2 -m AppSrv01 -p 9297367


      Example 6

      Generate default data from a core file called core.11424 generated by the program /usr/local/bin/pmrouter and write it to the file coreinfo.txt:

      sh> stackit -f coreinfo.txt -c core.11424 -e /usr/local/bin/pmrouter




    Security

    You must have authority to examine a process in order for stackit to succeed. In most cases you can analyze only the processes you started. However, some programs use the UNIX setuid/setgid bits to run as another user, and in most cases you must be root to examine such processes.


    For example, WebSphere MQ uses the setuid/setgid bits to run as the mqm user. If you were logged in as mqm when you started WebSphere MQ, then the mqm user should be able to run stackit against the queue manager processes; Otherwise, you must run stackit as root.


    If stackit cannot run against a process, it will tell you who can analyze it. Be sure to look for such messages before sending stackit output to IBM for review. For example:

      stackit: Success rate: 0%
      stackit: Stackit failed to analyze processes which belong to another user.
               Run stackit as scotty to analyze those processes.


    The root user can examine any process, so run stackit as root if you are having difficulty.



    Debuggers

    Stackit gathers information using the tools provided by your operating system. One important tool stackit calls is a debugger, which it may use to examine live processes and always uses to examine core files. The debuggers supported by stackit on each operating system are:

      AIX: The dbx debugger is in the bos.adt.debug LPP which is part of AIX.

      HP-UX: The Wildebeest Debugger (WDB) is an HP supported implementation of the GNU Debugger and is available for free download from HP for PA-RISC and Itanium systems.

      Linux: The GNU Debugger (gdb) is available in the gdb RPM on many Linux distributions.

      Solaris: The modular debugger (mdb) is part of the Solaris SUNWmdb package.


    Stackit goes to great lengths to ensure that it does not interrupt or terminate processes when using a debugger, even if you kill or cancel stackit (e.g. by using Ctrl-C). However, there is a small chance that a fault in the debugger or in stackit itself could terminate a process. You should use stackit only when diagnosing a problem to minimize the chance of a failure.

    You can also add the 'safe' option to the stackit command line, preventing stackit from using debuggers against processes on your system. Stackit may not be able to gather all the data you requested, especially on HP-UX and Linux, but the safe option ensures it cannot terminate any processes by accident. Example 5 demonstrates the use of the 'safe' option.



    Sample Output

    This sample output shows the kind of information stackit can gather. In this example, stackit was able to analyze one process successfully, but lacked authority to analyze another:

      sh> stackit -o stack,asm runmqlsr inetd
      stackit: V4.3 running on AIX 7.1 (powerpc) with arguments: -o stack,asm
               runmqlsr inetd
               

      Analyzing process 5963782                   30 June 2012 at 00:39:35 GMT
      ========================================================================

             PID     PPID  STARTED  EUSER  EGROUP  COMMAND
          255780   163782   Jun 19    mqm     mqm  runmqlsr -m BACH.QM -t TCP


        Thread Summary

           thread  state-k  state-u      k-tid  mode  held scope function
           $t1     wait     running  102629381     k    no   sys
          >$t2     run      blocked   87818279     k    no   sys  _event_sleep
           $t3     wait     blocked  101711909     k    no   sys
           $t4     wait     running   31129817     k    no   sys


        Thread Stacks

          Thread $t1:
          .() at 0x90000004b9ec898
          cciTcpListenConv() at 0x9000000039d97e0
          ccxListenConv() at 0x900000000ee2cd4
          WaitForConnectLoop() at 0x1000004dc
          main() at 0x100001198

          Thread $t2:
          _event_sleep(??, ??, ??, ??, ??, ??) at 0x9000000007df324
          _p_sigtimedwait(??, ??, ??) at 0x9000000007e3dc4
          xehAsySignalMonitor() at 0x90000000091014c
          ThreadMain() at 0x9000000008ee7f0

          Thread $t3:
          .() at 0x90000004c7ded80
          _event_wait(??, ??) at 0x9000000007dfd58
          _cond_wait_local(??, ??, ??) at 0x9000000007edb04
          _cond_wait(??, ??, ??) at 0x9000000007ee0dc
          pthread_cond_wait(??, ??) at 0x9000000007eed48
          xtmTimerThread() at 0x900000000919d18
          ThreadMain() at 0x9000000008ee7f0

          Thread $t4:
          .() at 0x90000004c903f50
          xcsWaitFd() at 0x9000000009411a8
          xcsWaitSocket() at 0x900000000940784
          cccJobMonitor() at 0x900000000d23c44
          ThreadMain() at 0x9000000008ee7f0


        Assembler Instructions

          0x9000000007df300 (_event_sleep+0x580)   beq  0x9000000007df398
          0x9000000007df304 (_event_sleep+0x584)    li  r6,0x0
          0x9000000007df308 (_event_sleep+0x588)   ori  r5,r29,0x0
          0x9000000007df30c (_event_sleep+0x58c)  cmpi  cr4,0x1,r0,0x0
          0x9000000007df310 (_event_sleep+0x590)   ori  r0,r0,0x0
          0x9000000007df314 (_event_sleep+0x594)   ori  r0,r0,0x0
          0x9000000007df318 (_event_sleep+0x598)   ori  r0,r0,0x0
          0x9000000007df31c (_event_sleep+0x59c)   ori  r1,r1,0x0
          0x9000000007df320 (_event_sleep+0x5a0)    bl  0x9000000007e0450
          0x9000000007df324 (_event_sleep+0x5a4)    ld  r2,0x28(r1)
          0x9000000007df328 (_event_sleep+0x5a8)   sli  r0,r3,0x0
          0x9000000007df32c (_event_sleep+0x5ac)   stw  r3,0x70(r1)
          0x9000000007df330 (_event_sleep+0x5b0)  addi  r4,0x78(r1)
          0x9000000007df334 (_event_sleep+0x5b4)    ld  r3,0x30(r31)
          0x9000000007df338 (_event_sleep+0x5b8)   ori  r5,r31,0x0
          0x9000000007df33c (_event_sleep+0x5bc)   ori  r6,r30,0x0
          0x9000000007df340 (_event_sleep+0x5c0)  cmpi  cr0,0x1,r3,0x0
          0x9000000007df344 (_event_sleep+0x5c4)  addi  r7,0x70(r1)
          0x9000000007df348 (_event_sleep+0x5c8)   beq  0x9000000007df1ec
          0x9000000007df34c (_event_sleep+0x5cc)    bl  0x9000000007df6a0
          0x9000000007df350 (_event_sleep+0x5d0)  cmpi  cr0,0x0,r3,0x0


      Analyzing process 3145844                   30 June 2012 at 00:39:36 GMT
      ========================================================================

             PID     PPID  STARTED  EUSER   EGROUP  COMMAND
         3145844  3211324   Mar 29   root   system  /usr/sbin/inetd


      stackit: You must run stackit as root to analyze process 3145844.
      stackit: Analysis failed for process 3145844.
               
               

      Summary of results                          30 June 2012 at 00:39:36 GMT
      ========================================================================
      stackit: Success rate: 50%
      stackit: Stackit failed to analyze processes which belong to another
               user.  Run stackit as root to analyze those processes.

    DISCLAIMER: All source code and/or binaries attached to this document are referred to here as "the Program". IBM is not providing program services of any kind for the Program. IBM is providing the Program on an "AS IS" basis without warranty of any kind. IBM WILL NOT BE LIABLE FOR ANY ACTUAL, DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES OR FOR ANY ECONOMIC CONSEQUENTIAL DAMAGES (INCLUDING LOST PROFITS OR SAVINGS), EVEN IF IBM, OR ITS RESELLER, HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

    Product Alias/Synonym

    WMQ MQ

    Rate this page:

    (0 users)Average rating

    Add comments

    Document information


    More support for:

    WebSphere MQ
    Problem Determination

    Software version:

    5.3, 6.0, 7.0, 7.1, 7.5

    Operating system(s):

    AIX, HP-UX, Linux, Solaris

    Software edition:

    All Editions

    Reference #:

    1179404

    Modified date:

    2013-06-14

    Translate my page

    Machine Translation

    Content navigation