Crash on AIX produces no core or a truncated core

Technote (troubleshooting)


Problem(Abstract)

This document outlines what needs to be done to ensure that a full core file is produced on AIX if WebSphere Application Server crashes.

Resolving the problem

Follow these directions in order, unless directed by IBM support.

NOTE: The settings below require a restart of your application server. If you also use a nodeagent to start your server(s), you will need to restart this as well. Changing ulimit settings additionally require the restart to occur on the same command line (or terminal) session.



1. Setting Ulimits
See Also: Guidelines for setting ulimits

To set ulimits on the core and file sizes to unlimited, run these two commands as the user who starts the nodeagent and/or application server

ulimit -c unlimited
ulimit -f unlimited


You can run ulimit -a to verify current ulimit settings.

Ulimits can also be altered at a global level. See the FAQ for more information.



2. Configuring the Operating System for Full Core Generation

If you do not have access to the SMIT administration tool, the following flag can be set from the command line (as the root user):

To set full core generation:
chdev -a fullcore=true -lsys0


To verify full core is set:
lsattr -Elsys0 | grep full





Additional steps if still unable to capture a core file


3. Disable Signal Handlers

Sometimes a loaded library or external process can trap some signals, especially signal 3 and 11, which prevent any core file generation by the JVM.

    a. Disable MQ Signal Traps

    WebSphere MQ is known to trap a subset of signals that the JVM also uses. If you are using WebSphere MQ, or are not sure, simply add this environment variable to your configuration:


    name:  MQS_NO_SYNC_SIGNAL_HANDLING
    value: true



    b. Disable All Signal Handlers

    To force the operating system to handle all signals sent to the JVM process, you can disable all JVM signal handlers.

    For IBM SDK 5.0 and later, set this JVM argument:
    -Xrs
    NOTE: On SDK 6.0, to prevent unintentional crashes due to SIGTRAP, clear the shared class cache by executing <WAS_HOME>/bin/clearClassCache.sh

    For prior versions of the IBM SDK, set this environment variable:
    name:  IBM_NOSIGHANDLER
    value: true  




4. Disable Javacore Generation

On rare instances, disabling javacore generation will help produce a core file.
To disable, simply add the following environment variable:

name:  DISABLE_JAVADUMP
value: true



5. Execute pdump.sh

In cases where core files are still not being produced, you can execute the attached script pdump.sh to extract information from the running process. This is especially helpful if you suspect the process is in a zombie state and does not respond to any signals.

You can download the latest version from this location:
ftp://ftp.software.ibm.com/aix/tools/debug/pdump.sh


    pdump.sh <Java_PID>



This will create a file pdump.java.###.txt file. Locate the line containing the string "sigcatch". If SEGV is listed in output, then the signal is being caught. Both SEGV and SIGSEGV represent signal 11.





Frequently Asked Questions (FAQ)


What happens if I do not have write permission in the profile's root directory, or the directory I am redirecting javacores, heapdumps, and system core files to?

This will result in a failure when writing these files to the system. The error may be recorded in the native_stderr.log.

Also make sure that you have enough free space on your file system.



Even with all ulimit settings set to unlimited, core files are truncated at 2GB?

This is a limitation on 32-bit processes. You can avoid this issue if you enable large file support on the operating system, or use a 64-bit version of WebSphere Application Server.
For the first workaround, use the Large File Enabled option when adding a journaled filesystem. Refer to AIX operating system documentation for additional details.

Additionally, running out of free space can cause file truncation.



Can I test my configuration to see if a core can be generated?

Yes you can simulate a crash by sending a signal 11 to the JVM process. This will terminate the process.

kill -11 PID

An alternative is to use the gencore command. This will produce a core file and will allow the process to continue running.

gencore PID


Are ulimit settings permanent?

No, they are temporary and last as long as the session is alive. Ulimits are set on a per user basis, and the settings are applied per session, such as a command-line window. If a brand new session is started, and is not spawned from the current session, the ulimits will load the defaults.



Can I set ulimit settings globally?
By editing the /etc/security/limits file, ulimit settings can be set globally.

In the stanza for the user that runs the process, set fsize = -1 and core = -1. Setting these values to -1 changes the setting to unlimited.




Where can core files be generated?

Normally found in the profile's root directory, but can be in a number of alternative locations. Try searching in these locations first:
  • <WAS_HOME>/profiles/<PROFILE_NAME>
  • <WAS_HOME>/bin/
  • /tmp

If you cannot find a core file in any of these locations, search your entire machine for core* files.

On the IBM SDK, the environment variable IBM_COREDIR can be used to redirect core dumps to a different location.

Related information

Additional commands to ensure core generation
Submitting information to IBM support
Steps to getting support
MustGather: Read first
Troubleshooting guide

pdump.sh

Cross reference information
Segment Product Component Platform Version Edition
Application Servers WebSphere Application Server - Express Hangs/performance degradation AIX 7.0, 6.1, 6.0.2, 5.1.1
Application Servers Runtimes for Java Technology Java SDK AIX

Rate this page:

(0 users)Average rating

Document information


More support for:

WebSphere Application Server
Crash

Software version:

6.0.2, 6.1, 7.0, 8.0, 8.5

Operating system(s):

AIX

Software edition:

Base, Express, Network Deployment

Reference #:

1052642

Modified date:

2007-09-28

Translate my page

Machine Translation

Content navigation