IBM Support

Crash on AIX produces no core or a truncated core

Technote (troubleshooting)


Problem(Abstract)

This document outlines what needs to be done to ensure that a full core file is produced on AIX if WebSphere Application Server crashes.

Resolving the problem

Core dump files should generate in WebSphere Application Server during a crash of if manually triggered (via kill -11, the gencore command, or from the admin console itself), but a few conditions may end up truncating the dump file or prevent it from generating.
NOTE: There is a different technote that discusses issues where the process does not record a crash event.


SET ULIMITS
See Also: Guidelines for Setting Ulimits

The ulimits for core and file need to be tuned so that the hard and soft limits are set to unlimited. This may require root access to change. Usually the core setting just needs the hard limit configured to unlimited for IBM SDK's.

If you change it on the command line, you will need to restart your nodeagent from the same command line window. Your application server can be started normally. In the case where this installation doesn't have a nodeagent, the appserver must be started from the command line window. This is because the ulimit settings would be temporary for that session.

ulimit -c unlimited
ulimit -f unlimited


For setting them at a global level, you would need to edit the /etc/security/limits file to change the core and file settings for hard and soft limits. However, if the application server is started by the init process at startup, these settings will not take effect. You will need to use the ulimit command line settings directly in the init.d script.

To verify the change, you can use ulimit -a on the same command line.
If you want to validate an already running application server process, capture a javacore (kill -3 PID).


CONFIGURE FULL CORE ON THE OPERATING SYSTEM
Check your OS configuration (in the SMIT tool) to see if the fullcore option is set to true.

The IBM SDK will notify you in the native_stderr.log (or your logging for standard error is directed) if this is not set via this string output when a core dump is generated:

Note: "Enable full CORE dump" in smit is set to FALSE and as a result there will be limited threading information in core file.

If you do not have access to the SMIT administration tool, the following flag can be set from the command line (as the root user):

To set full core generation:
chdev -a fullcore=true -lsys0

To verify full core is set:
lsattr -Elsys0 | grep full



DISK SPACE
Check your partitions where WebSphere Application Server resides and make sure there is enough space for the dump to be produced. Usually an error message will be seen in the native_stderr.log that indicates if the core was unable to be written.

To check all of your partitions, execute this command (the -k is for kilobytes):

df -k


DISABLE SIGNAL HANDLERS
To force the operating system to handle all signals sent to the JVM process, you can disable all JVM signal handlers.

For IBM SDK 5.0 and later, set this JVM argument:
-Xrs

NOTE: On SDK 6.0 and later, to prevent unintentional crashes due to SIGTRAP, clear the shared class cache by executing <WAS_HOME>/bin/clearClassCache.sh



EXECUTE "pdump.sh" SCRIPT
In cases where core files are still not being produced, you can execute the attached script pdump.sh to extract information from the running process. This is especially helpful if you suspect the process is in a zombie state and does not respond to any signals.

You can download the latest version from this location:
ftp://ftp.software.ibm.com/aix/tools/debug/pdump.sh

pdump.sh <Java_PID>


This will create a file pdump.java.###.txt file. Locate the line containing the string "sigcatch". If SEGV is listed in output, then the signal is being caught. Both SEGV and SIGSEGV represent signal 11.



Additional Questions:
What happens if I do not have write permission in the profile's root directory, or the directory I am redirecting javacores, heapdumps, and system core files to?

This will result in a failure when writing these files to the system. Check for an error in the native_stderr.log, as it may try to write the dump to an alternate folder (such as /tmp).



Even with all ulimit settings set to unlimited, core files are truncated at 2GB?

There is a limitation on 32-bit processes which can be worked around if you enable large file support..
Using a 64-bit version of WebSphere Application Server also resolves this limitation, although if you run out of disk space the dump can still be truncated.



Can I test my configuration to see if a core can be generated?

Yes you can simulate a crash by sending a signal 11 to the JVM process. This will terminate the process.

kill -11 PID


An alternative is to use the gencore command. This produces a core file and keeps the process running.

gencore PID

Related information

Submitting information to IBM support
Steps to getting support
MustGather: Read first
Troubleshooting guide

pdump.sh

Cross reference information
Segment Product Component Platform Version Edition
Application Servers WebSphere Application Server - Express Hangs/performance degradation AIX 7.0, 6.1
Application Servers Runtimes for Java Technology Java SDK AIX

Document information

More support for: WebSphere Application Server
Crash

Software version: 6.1, 7.0, 8.0, 8.5, 8.5.5, 9.0.0.0

Operating system(s): AIX

Software edition: Base, Express, Liberty, Network Deployment

Reference #: 1052642

Modified date: 28 September 2007