IBM Support

How to force a system dump

Question & Answer


Question

How can I force a system dump to obtain debugging information?

Answer


Introduction

This document describes the various methods that can be used to force a system dump. A system dump is a snapshot of the internal state of the operating system at the time of a system crash, or when a user resets the system by forcing a system dump. After the OS is no longer running, this snapshot is most often written to a logical volume specifically created for dump data, but may also be written into a paging space logical volume, and then later copied to a dump file after the system reboots. After the dump data has been written into a logical volume, the system will reboot if the auto-reboot flag is enabled. A user might decide to force a dump for any number of reasons, including slow system performance, network problems, or login problems. However these types of problems are best handled by working with a live system instead of forcing a dump.

When to force a system dump

Generally you should only force a system dump when:

  • The system is completely hung.

    Note: Sometimes a system might hang during a shutdown operation. The shutdown command is a script that executes commands sequentially. If any command within this script hangs, the entire system will appear to hang because the shutdown process will stop. Generally forcing a dump at this time is not very useful, unless the command that is hanging within the shutdown script can be identified.
     
  • One or more processes are hung in kernel mode and cannot be killed.

    Note: Forcing a dump for this type of condition normally should be a last resort. The pdump script can be used first if there are only one or a few processes that are hanging.
     
  • Out of memory errors are returned when running a number of commands, indicating a possible problem with kernel memory.
     
  • A file system is hanging, meaning that one or more file system or Logical Volume Manager (LVM) related commands hang when operating within a specific file system.
How to force a system dump

There are a number of different methods to force a system dump. The method used depends on the type of hardware, if the system is running as an LPAR or stand-alone, and if the LPAR is managed by an HMC (Hardware Management Console). The AIX command sysdumpstart can also be used be used to initiate a system dump, but in most cases a dump is forced because it is not possible to run any commands on the system.

Note: while a forced dump is being written to disk, a 0c2 code will be displayed in the LED window on the front panel, or in the LPAR status area on the HMC. After the dump process has finished, a 0c0 will be displayed if the dump completed successfully or a different code if there was an error. The machine should not be interrupted while the 0c2 code is displayed.

Stand-alone using front panel
NOTE: This section applies to obsolete hardware made by IBM prior to Power servers that could be managed by an HMC.  The section is kept for historical purposes.

On non LPAR stand-alone systems, a dump cannot be forced unless the always allow dump flag is set to TRUE. This flag is ignored on LPARs. Ensure this flag is enabled before the need to force a dump arises.

  • sysdumpdev -l
    Show the current setting of Always Allow Dump flag
     
  • sysdumpdev -K
    Set Always Allow Dump flag to TRUE

To force a dump on most stand-alone systems, use the reset button, soft power button, or keyboard.

  • Reset Button
    Most stand alone systems have a yellow reset button that can be used to force a dump. This button should be pressed for about 5 seconds or until the 0c2 code is displayed in the LED window.
     
  • Soft Power Button
    Some older systems do not have a reset button. Instead, the soft power button is used both to power on and off the machine, and to reset the machine to generate a system dump. For machines with a soft power button but no reset button, press and hold the soft power button for about 5 seconds or until 0c2 is displayed in the LED window.
     
  • Ctrl-Alt-Numpad1
    For systems that do not have a yellow reset button or a soft power button that doubles as a reset button, try pressing Ctrl-Alt and the 1 key on the numeric keypad on the console. If the system has an LED panel, the 0c2 code will be displayed until the dump process is finished, at which time another code will be displayed to indicate the status of the dump. If the system does not have an LED panel, the dump process will be completed after disk activity stops.
LPAR using HMC

A Hardware Management Console (HMC) is a system with a graphical user interface that can be used to manage LPARs. Currently this system is a PC running Linux that also runs a web server for remote access. A user can log on directly to the HMC, or may access the system remotely with a web browser. HMC functions, such as forcing a system dump, can be accessed through a command line or a GUI.

  • HMC command line
    chsystate -m managedSystemName -r lpar -n lparName -o dumprestart

     
  • HMC GUI (HMC version 7 and above)
    1. In the navigation area, open Systems Management->Servers and click the managed system where the LPAR is located
    2. In the contents area, select the LPAR
    3. Click the Tasks button and choose Operations->Restart
    4. Select the dump option
    5. Click OK
       
  • HMC GUI (HMC version 6 and below)
    1. In the navigation area, open Server->Partition
    2. Click Server Management
    3. In the contents area, open the server that has the LPAR
    4. Open Partitions
    5. Right click on the LPAR and select Restart Partition
    6. Select the dump option
    7. Click OK
       
  • HMC GUI on a POWER4
    1. In the Contents area, select the partition
    2. In the menu, click Selected
    3. Select Operating System Reset
    4. Select Soft Reset
    5. Click Yes
LPAR using Integrated Virtual Manager (IVM)

To force a system dump on one of the AIX partitions managed by the IVM, run the following command on the IVM command line interface:

chsysstate -r lpar -o dumprestart -n AIX_PARTITION_NAME

LPAR using Novalink (NL)

The  command syntax is
 
pvmctl logicalparttion dumprestart   {--object-id | -i} <LogicalPartition field>=value {args}
    optional arguments are:
          --no-prompt           Assume a positive response to all prompted input.
   identifier arguments:
         --object-id id=value, -i id=value
The identifier is used when querying for an object type.   It is specified as a "name=value" pair. Example: --object-id name=value
An example of using this command on a partition is
pvmctl LogicalPartition dumprestart -i id=53
where 53 is the LPAR ID of the LPAR you want to dump.  If you know the LPAR name you can use
 
pvmctl lpar dumprestart -i name="lparname"
where "lparname" is the name of the partition you wish to reset.   Please note that you can use "lpar" and "LogicalPartition" interchangeably.
VIO client LPAR using padmin command on VIO server

VIO LPARs are often managed from an HMC or IVM, just like regular LPARs. If the HMC or IVM is unavailable or inaccessible, a dump can be forced using the command line on the VIOS. To do this, log into the VIOS as padmin and run the following command:

chsysstate -r lpar -o dumprestart { -n Name | --id PartitionID } [ -m ManagedSystem ]  


Example:

chsysstate -r lpar -o dumprestart --id 3
Stand-alone POWER 5,6,7,8 using front panel

This procedure can be used to force a dump on a POWER5, POWER6, POWER7, or POWER8 that is not managed by an HMC.

  1. Put the system in manual mode
    1. On the front panel, press one of the increment buttons until function 02 is displayed in the top left corner
    2. Press the Enter button (middle button) to activate function 02
    3. Press the Enter button again to move the selector < to the middle character, which should be an N for Normal
    4. Press an increment button to change the middle character to an M for Manual
    5. Press the Enter button one or two times until the display shows only function 02 again
       
  2. Force the system dump
    1. On the front panel, press one of the increment buttons until function 22 is displayed in the top left corner
    2. Press the Enter button. The operator panel should display A1003022. This code represents the confirmation message, "Are you sure?"
    3. If instead the display shows 22 00 press an increment button to select the partition to dump (00 is the first partition) and then press the Enter button. Now the display should show A1003022
    4. Press one of the increment buttons until function 22 is again displayed in the top left corner
    5. Press the Enter button to confirm and activate function 22, which will initiate the system dump and display code 00C2 while the system is writing the dump file
       
  3. Put the system back in normal mode while the dump file is writing
    1. On the front panel, press one of the increment buttons until function 02 is displayed in the top left corner
    2. Press the Enter button (middle button) to activate function 02
    3. Press the Enter button again to move the selector < to the middle character, which should be an M for Manual
    4. Press an increment button to change the middle character to an N for Normal
    5. Press the Enter button one or two times until the display shows only function 02 again
       
  4. Do not interrupt the system as long as the code 00C2 is being displayed. When the dump process completes, the system will halt unless auto-reboot is enabled. If the system does not automatically reboot, recycle power to boot the system.
Blade using AMM browser

Blades can be managed using the AMM browser interface. See this link for forcing a system dump on a JS12, JS20, JS21 (Type 7988, 8844) and JS22 using the AMM browser interface.

VIO server on Blade using AMM browser

See  technote T1012839.

Any system using  sysdumpstart  command

Any system with AIX up and running that has a properly configured dump device can be dumped by running the sysdumpstart command. This command is normally used for testing purposes since a system dump is most often needed when a system is hung and unable to run any commands, including the sysdumpstart command. But sometimes AIX Support will recommend dumping a system that is not completely hung using the sysdumpstart command.

Run the following command to initiate a system dump to the primary dump device.

sysdumpstart -p

After running this command, the system will immediately halt and begin writing the system dump. If auto-reboot is enabled, the system will automatically reboot after the dump process completes. If auto-reboot is not enabled, the system will halt. Recycle power to boot the system if it is halted.

[{"Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cvydAAA","label":"Dumps"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
19 March 2024

UID

isg3T1019210