IBM Support

Why a Core File is not Created

Question & Answer


Question

This document describes:
  • Various scenarios that prevent a core dump from being generated when a process terminates abnormally
  • How to avoid the problems with the chcore and syscorepath commands.

Answer


Introduction

The default core dump facility on AIX normally creates a file named core in the current working directory for the process that terminated abnormally. If a core file is created successfully, a CORE_DUMP entry is written into the error report. Sometimes a core file is not created, and a CORE_DUMP_FAILED error might be added to the error report to log the failure. This error contains a reason code that can be used to help determine why the core file was not created. The reason code is an errno code, a system error code that is used to report errors from library functions. errno codes are listed in the AIX header file /usr/include/sys/errno.h.

Some of the causes for core dump failure can be avoided by configuring the core dump facility with the chcore command or the older syscorepath command. These commands enable a user to set up a directory where all core files will be written. If the chcore -n on option is used, the syscorepath and chcore commands will create unique core file names with the following format:
core.pid.ddhhmmss (where pid is the process ID)
dd: Day of the month,
hh: Hour in 24-hour format
mm: Minutes
ss: Seconds.
See the man pages for chcore and syscorepath for details, and the AIX Core Dump Facility technical note.

CORE_DUMP_FAILED Error

The following output is an example CORE_DUMP_FAILED error. Note the REASON CODE field near the bottom of the entry.

LABEL:        CORE_DUMP_FAILED
IDENTIFIER:    45C7A35B

Date/Time:       Mon Jan 17 14:15:43 MST
Sequence Number: 39603
Machine Id:      0008ADAA4C00
Node Id:         p620
Class:           S
Type:            PERM
Resource Name:   SYSPROC        

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
INTERNAL SOFTWARE ERROR

User Causes
USER GENERATED SIGNAL

Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW

    Recommended Actions
    RERUN THE APPLICATION PROGRAM
    IF PROBLEM PERSISTS THEN DO THE FOLLOWING
    CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
          11
USER'S PROCESS ID:
       57812
REASON CODE
          11
USER ID
         232
PROCESSOR ID
           0
CORE FILE NAME
/u1/GA.PROD/core
PROGRAM NAME
uvsh

The SIGNAL NUMBER section contains the signal that caused the program to terminate. These signals can be listed by running the command kill -l. The CORE FILE NAME section contains the location and name of the core file that would have been written if there was no failure. The PROGRAM NAME section contains the name of the program that terminated. The REASON CODE section contains an errno constant that can be used to diagnose the cause of the core dump failure. The errno constants can be viewed in the file /usr/include/sys/errno.h. Only some of the errno codes are used as reason codes.

Note: On some older versions of AIX, the Probable Causes section contains the line "SYSTEM RUNNING OUT OF PAGING SPACE", and the Recommended Actions section contains the line "DEFINE ADDITIONAL PAGING SPACE". These messages are misleading and can be ignored.

errno Codes

Here are some of the errno codes that could be listed in a CORE_DUMP_FAILED error. The most common codes are in bold text.

#define EPERM   1       /* Operation not permitted              */ 
#define EIO     5       /* I/O error                            */
#define EAGAIN  11      /* Resource temporarily unavailable     */ 
#define EACCES  13      /* Permission denied                    */ 
#define EBUSY   16      /* Resource busy                        */
#define EEXIST  17      /* File exists                          */
#define ENFILE  23      /* Too many open files in system        */
#define EMFILE  24      /* Too many open files                  */
#define EFBIG   27      /* File too large                       */
#define ENOSPC  28      /* No space left on device              */ 

Failure Scenarios

The following table contains various scenarios that can keep a core file from being created when a process terminates abnormally. For each scenario, information is provided about the CORE_DUMP_FAILED error if one is added to the error report.
Scenario CORE_DUMP_FAILED
There is not enough space in the file system to write the core file. REASON CODE
ENOSPC 28
The ulimit for core is set to 0 in the account where the program is running. This disables core file creation. REASON CODE
EPERM 1

CORE FILE NAME
blank
The process sets a current working directory where it does not have write permissions. Since the core file is written into the current working directory, the core file cannot be written.

Note: Use the chcore or syscorepath command to avoid this failure.
REASON CODE
EACCES 13

CORE FILE NAME
path
(path to where the system attempted to write the core file)
By default, all core files that are generated on an AIX system will have the name core. If a process is core dumping and the core file is being written, and another process terminates and attempts to write a core file in the same directory, the file core will be busy and the second process will not be able to write to the file.

Note: Use the chcore or syscorepath command and unique core file naming to avoid this failure.
REASON CODE
EAGAIN 11
OR
EACCES 13
The process has set the SA_NODUMP flag in the call to sigaction(). You would need the source code for the program to verify that this is the reason for the core dump failure. Any program can prevent a core dump by setting this flag in a sigaction request. REASON CODE
EPERM 1
If the suid or sgid bit is set on the executable, then it is possible that a core file will not be created. This can happen if the real user or group id is not identical to the effective user or group id.

Notes
See Example 1
REASON CODE
EPERM 1

CORE FILE NAME
blank
A process attempts to write a core file into a directory where a core file already exists and the ownership and permissions on the file do not allow it to be overwritten.

Notes
See Example 2
Use the chcore or syscorepath command to avoid this failure.
REASON CODE
EACCES 13

CORE FILE NAME
path
(path to where the system attempted to write the core file)
A process attempts to write a core file into a directory where a core file already exists. This core file is owned by another user but has write permissions enabled on either group or other. The attempt to write the new core file results in the core file being zeroed out.

Notes
See Example 3
Use the chcore or syscorepath command to avoid this failure.
REASON CODE
EPERM 1

CORE FILE NAME
path
(path to where the system attempted to write the core file)

Note: Some versions of AIX might not add the CORE_DUMP_FAILED entry to the error report.
A process traps the signal whose default action is to create a core file but does not call the abort() function to actually create the core file. None
A process ignores a signal that would, by default, generate a core file.

Notes
See Example 4
None

Example 1

If the suid or sgid bit is set on the executable, then a core file may not be created. This can happen if the real user or group id is not identical to the effective user or group id. According to the man pages for core, a core dump is not be created if the saved user id and the effective user id are not the same, or if the saved group id and the effective group id are not the same.
chmod +s program.exe
This command turns on both suid and sgid. This prevents creation of a core file.

chmod u+s program.exe
This command will turn on only suid.

If sgid is turned on, then the core file is not created, because the real group id and the effective group id is not the same.
  • Example A
    Permissions of program.exe are root:fnusr, 0755
    chmod +s program.exe
    Permissions of program.exe are root:fnusr, 6755
    From root, execute program.exe:
    Real/Saved user id     : root
    Effective user id      : root
    Real/Saved group id    : system
    Effective group id     : fnusr


    Note: The saved group id is not the same as the effective group id, so no core file is created.
  • Example B
    Permissions of program.exe are root:fnusr, 0755
    chmod u+s program.exe
    Permissions of program.exe are root:fnusr, 4755
    From root, execute program.exe:
    Real/Saved user id     : root
    Effective user id      : root
    Real/Saved group id    : system
    Effective group id     : system


    Note: The saved and effective user ids are the same, and the saved and effective group ids are the same, so a core file is created.

Example 2

A process attempts to write a core file into a directory where a core file already exists, and the ownership and permissions on the file do not allow it to be overwritten.

$ ls -l core
-rw-r--r--   1 rej      staff        769727 Oct 04 08:59 core
$ id
uid=709(chris) gid=1(staff)
$ sleep 100 &
[1]     352458
$ kill -6 352458
$
[1] + IOT/Abort trap           sleep 100 &
$ ls -l core
-rw-r--r--   1 rej      staff        769727 Oct 04 08:59 core

$ errpt -aJ CORE_DUMP_FAILED
---------------------------------------------------------------------------
LABEL:          CORE_DUMP_FAILED
IDENTIFIER:     FAA1D46F

Date/Time:       Tue Oct  4 09:04:01 CDT 2005
Sequence Number: 543
Machine Id:      000870664C00
Node Id:         vegas
Class:           S
Type:            PERM
Resource Name:   SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
INTERNAL SOFTWARE ERROR

User Causes
USER GENERATED SIGNAL

Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
                352458
REASON CODE
          13
USER ID
         709
PROCESSOR ID
          -1
CORE FILE NAME
/home/chris/core
PROGRAM NAME
sleep

Example 3

A process attempts to write a core file into a directory where a core file already exists. This core file is owned by another user but has write permissions enabled on either group or other. The attempt to write the new core file results in the core file being zeroed out.

$ ls -l core
-rw-rw-r--   1 rej      staff        769727 Oct 04 08:49 core
$ id
uid=709(chris) gid=1(staff)
$ sleep 100 &
[1]     237786
$ kill -6 237786
$
[1] + IOT/Abort trap           sleep 100 &
$ ls -l core
-rw-rw-r--   1 rej      staff             0 Oct 04 08:52 core

$ errpt -aJ CORE_DUMP_FAILED
---------------------------------------------------------------------------
LABEL:          CORE_DUMP_FAILED
IDENTIFIER:     FAA1D46F

Date/Time:       Tue Oct  4 08:52:36 CDT 2005
Sequence Number: 541
Machine Id:      000870664C00
Node Id:         vegas
Class:           S
Type:            PERM
Resource Name:   SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
INTERNAL SOFTWARE ERROR

User Causes
USER GENERATED SIGNAL

Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
                237786
REASON CODE
           1
USER ID
         709
PROCESSOR ID
          -1
CORE FILE NAME
/home/chris/core
PROGRAM NAME
sleep

Example 4

A process ignores a signal that would, by default, generate a core file. We can determine if a signal is ignored by using the procsig command.

This command will list all signal actions defined for process 237786:

procsig 237786
The output of this command might look like this:
HUP         caught                  
INT         caught                  
QUIT        caught                  
ILL         caught                  
TRAP        caught                  
ABRT        caught                  
EMT         caught                  
FPE         caught                  
KILL        default  RESTART        
BUS         caught                  
SEGV        default            
SYS         caught              
PIPE        caught              
ALRM        caught              
TERM        ignored            
URG         default            
STOP        default            
TSTP        ignored            
CONT        default            
...

chcore and syscorepath

To avoid some of the problems which can cause a core file to not be generated, the chcore or syscorepath commands can be used to direct core files to be written into a user specified directory. In this example, the directory where the core files are copied is /tmp/corefiles.

chcore -p on -n on -l /tmp/corefiles -d

The older syscorepath command can also be used to direct core files to a central location. Unlike chcore, syscorepath can be used to generate core files from suid and sgid executable files.

syscorepath -p /tmp/corefiles

See the man pages for these commands for more details, and the AIX Core Dump Facility technical note.

Conclusion

Normally a core file is written when a process terminates abnormally. The core file can be analyzed to help determine why the process failed. However, there are a number of scenarios that will prohibit a core file from being created. In some of these cases, a CORE_DUMP_FAILED entry is written into the error report. The REASON CODE section in this entry can be used to determine why the core file was not created. For cases where a CORE_DUMP_FAILED entry is not written into the error report, the running process, the process executable file, or the process source code must be investigated to determine why a core file was not generated.

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Support information","Platform":[{"code":"PF002","label":"AIX"}],"Version":"5.3;6.1;7.1","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
06 December 2019

UID

isg3T1011240