IBM Support

Troubleshooting a Hung Process or Command on PowerVM Virtual I/O Server

Technote (troubleshooting)


This technote describes how to troubleshoot a hung process or command on PowerVM Virtual I/O Server before resorting to potentially having to force a system dump.


This applies to PowerVM Virtual I/O Server version 2.2

Diagnosing the problem

See NOTE 1 in step 3 to try determining if the command in questions is actually hung as opposed to experiencing a delayed.

Resolving the problem

1. Download from and ftp (binary) to the VIOS as padmin (by default, you will be dropped in /home/padmin directory).

2. Login to the VIOS as padmin and change permissions.

    $ chmod 755

3. Go to the root shell and find the process ID (PID) for the hung process or command.

    $ oem_setup_env

    # ps -ef |grep <hung command> =>Get the PID. It is the number after the user name

    The following example uses the padmin snap command as the hung command, and the respective PID is 11993246

    # ps -ef|grep snap
    root 8060958 8585354 0 13:30:17 pts/2 0:00 grep snap
    padmin 9830500 11993246 0 13:30:12 pts/3 0:00 /bin/ksh /usr/sbin/snap -r
    padmin 11993246 9109512 0 13:30:12 pts/3 0:00 ioscli snap

      NOTE 1:
      Sometimes a command may be mistakenly considered to be hung when in reality, it may just be taking some time to come back. This may be expected on VIO environments with large storage and/or virtual configuration. Run proctree command to determine if the "hung" PID spawned any child processes. If so, get the PID of the youngest child process (the last one in the tree). In the following example, it is 7798810.

      # proctree 11993246
      2228366 /usr/sbin/srcmstr
      9437368 /usr/sbin/inetd
      10092564 telnetd -a
      9109512 -rksh
      11993246 ioscli snap
      9830502 /bin/ksh /usr/sbin/snap -a -c
      8978460 /bin/sh /usr/lib/ras/snapscripts/svCollect all
      8061118 /bin/sh /usr/lib/ras/snapscripts/svCollect all
      7274712 kdb -script

      Wait a few minutes, then re-ran the command (you can do that a few times) and see if the youngest child process (7274712, in this case) changes. If it does, then, more than likely the command is still running, and not hung.

4. If the command is indeed hung, run the tool against the last child PID listed at the bottom of the proctree output (7274712, in this example)

    # ./ -d <last child PID> ==>will create output file pdump.<hung command>.<PID>.<date>.out in the current working directory

    # ./ -d 7274712

      Getting general environment data ...
      Dumping process information from kdb ...

      dumping process slot 2928 ...
      Error getting thread list. Skip other kdb commands.

      Dumping process information with proc tools ...

      Dumping process information from dbx ...

      dumping tid 1 ...
      listing object files ...

      Output file is pdump.ioscli.11993246.11Oct2010-14.24.54.out
    # ls -la pdump.ioscli.7274712.11Oct2010-14.24.54.out
    -rw-r--r-- 1 root staff 85269 Oct 11 14:25 pdump.ioscli.7274712.11Oct2010-14.24.54.out

5. Rename the file to reflect your PMR and send the testcase. Example:

    # mv <original_filename>.out 99999.888.000.<original_filename>.out
    - 99999 is your PMR#
    - 888 is your Branch#
    - 000 is USA country code
-rw-r--r-- 1 root staff 85269 Oct 11 14:25 99999.888.000.pdump.ioscli.7274712.11Oct2010-14.24.54.out

6. Where to send the testcase.

Document information

More support for: Virtual I/O Server

Software version: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.2.4

Operating system(s): AIX, Other

Software edition: Enterprise, Express, Standard

Reference #: T1012503

Modified date: 13 February 2012

Translate this page: