Troubleshooting a Hung Process or Command on PowerVM Virtual I/O Server
This technote describes how to troubleshoot a hung process or command on PowerVM Virtual I/O Server before resorting to potentially having to force a system dump.
This applies to PowerVM Virtual I/O Server version 2.2
Diagnosing the problem
See NOTE 1 in step 3 to try determining if the command in questions is actually hung as opposed to experiencing a delayed.
Resolving the problem
1. Download pdump.sh from ftp://ftp.software.ibm.com/aix/tools/debug/ and ftp (binary) to the VIOS as padmin (by default, you will be dropped in /home/padmin directory).
2. Login to the VIOS as padmin and change permissions.
$ chmod 755 padmin.sh
3. Go to the root shell and find the process ID (PID) for the hung process or command.
# ps -ef |grep <hung command> =>Get the PID. It is the number after the user name
The following example uses the padmin snap command as the hung command, and the respective PID is 11993246
# ps -ef|grep snap
root 8060958 8585354 0 13:30:17 pts/2 0:00 grep snap
padmin 9830500 11993246 0 13:30:12 pts/3 0:00 /bin/ksh /usr/sbin/snap -r
padmin 11993246 9109512 0 13:30:12 pts/3 0:00 ioscli snap
Sometimes a command may be mistakenly considered to be hung when in reality, it may just be taking some time to come back. This may be expected on VIO environments with large storage and/or virtual configuration. Run proctree command to determine if the "hung" PID spawned any child processes. If so, get the PID of the youngest child process (the last one in the tree). In the following example, it is 7798810.
# proctree 11993246
10092564 telnetd -a
11993246 ioscli snap
9830502 /bin/ksh /usr/sbin/snap -a -c
8978460 /bin/sh /usr/lib/ras/snapscripts/svCollect all
8061118 /bin/sh /usr/lib/ras/snapscripts/svCollect all
7274712 kdb -script
Wait a few minutes, then re-ran the command (you can do that a few times) and see if the youngest child process (7274712, in this case) changes. If it does, then, more than likely the command is still running, and not hung.
4. If the command is indeed hung, run the pdump.sh tool against the last child PID listed at the bottom of the proctree output (7274712, in this example)
# ./pdump.sh -d <last child PID> ==>will create output file pdump.<hung command>.<PID>.<date>.out in the current working directory
# ./pdump.sh -d 7274712
Getting general environment data ...
Dumping process information from kdb ...
dumping process slot 2928 ...
Error getting thread list. Skip other kdb commands.
Dumping process information with proc tools ...
Dumping process information from dbx ...
dumping tid 1 ...
listing object files ...
Output file is pdump.ioscli.11993246.11Oct2010-14.24.54.out
-rw-r--r-- 1 root staff 85269 Oct 11 14:25 pdump.ioscli.7274712.11Oct2010-14.24.54.out
5. Rename the file to reflect your PMR and send the testcase. Example:
# mv <original_filename>.out 99999.888.000.<original_filename>.out
- 99999 is your PMR#
- 888 is your Branch#
- 000 is USA country code
6. Where to send the testcase.
More support for:
Virtual I/O Server
Software version: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.2.4
Operating system(s): AIX, Other
Software edition: Enterprise, Express, Standard
Reference #: T1012503
Modified date: 13 February 2012
Translate this page: