IBM Support

DB2 instance shutdown due to SIGKILL generated by the Out of memory (OOM) killer on Linux platform.

Technote (troubleshooting)


Problem(Abstract)


Looking at the db2diag.log entries, the data "0900 0000" (signal 9, which is SIGKILL) indicates the fact that a DB2 process (PID = 1122) was killed with signal 9.
The instance will crash if a SIGKILL is issued for any DB2 engine process.
DB2 doesn't issue signal 9 against its own engine process.

There is no signal handler routine for SIGKILL so no trap or core can be generated.
This signal must have been issued either manually by a user, programmatically by a user application or by the operating system.
In the first case, there could be a record of the kill command in the user's shell history log file.
Something external to DB2 caused the crash, it is not possible for DB2 to record which application, user, or OS issue caused it to go down.
The watchdog process is responsible for handling abnormal termination (here signal 9) cleanup of the main engine process and all FMPs. DB2 is the victim.

Symptom

DB2 Instance Shutdown.

The key entry in db2diag.log looks like :

-------------------------------------------------------------------------------------------------------------------------------------------------------
2010-10-09-01.06.16.347313+660 E13087E552 LEVEL: Severe
PID : 1120 TID : 46912711420224PROC : db2wdog 0
INSTANCE: db2inst1 NODE : 000
EDUID : 2 EDUNAME: db2wdog 0
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:20
MESSAGE : ADM0503C An unexpected internal processing error has occurred.
ALL DB2 PROCESSES ASSOCIATED WITH THIS INSTANCE HAVE BEEN SHUTDOWN.

Diagnostic information has been recorded. Contact IBM Support for further assistance.

2010-10-09-01.06.17.119332+660 E13640E422 LEVEL: Error
PID : 1120 TID : 46912711420224 PROC : db2wdog 0
INSTANCE: db2inst1 NODE : 000
EDUID : 2 EDUNAME: db2wdog 0
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:21
DATA #1 : Process ID, 4 bytes
1122
DATA #2 : Hexdump, 8 bytes
0x00002AAAB77FC378 : 0201 0000 0900 0000
-------------------------------------------------------------------------------------------------------------------------------------------------------


Cause

The Linux kernel has an interesting way of dealing with memory exhaustion, and it comes in the way of the Linux OOM (Out-Of-Memory) killer. When invoked, the OOM killer will begin terminating processes in order to free up enough memory to keep the system operational. In this scenario, OOM Killed process 1126 (db2sysc). This occurs because all available memory, including disk swap space, has been allocated and can be verified using 'free' command.


Environment

This issue only occurs for DB2 running on supported Linux platforms.

Diagnosing the problem

The footprints of OOM killer can be seen in the operating system error log /var/log/messages or dmesg command. Out of memory condition : all available memory, including disk swap space, has been allocated.
Below is the example, snip from /var/log/messages:
-------------------------------------------------------------------------------------------------------------------------------------------------------
Oct 9 01:06:09 lqportdb1 kernel: db2sysc invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Oct 9 01:06:09 lqportdb1 kernel: Call Trace: <ffffffff8015d94e>{oom_kill_process+87}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff8015dd82>{out_of_memory+299} <ffffffff8015f96b>{__alloc_pages+600}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff801612b8>{__do_page_cache_readahead+265} <ffffffff80137446>{del_timer_sync+12}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff802e4431>{schedule_timeout+146} <ffffffff88012c00>{:dm_mod:dm_any_congested+61}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff8015d079>{filemap_nopage+336} <ffffffff8016b068>{__handle_mm_fault+830}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff801455d8>{lock_hrtimer_base+37} <ffffffff802e78bf>{do_page_fault+2919}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff802e4870>{schedule_hrtimer+49} <ffffffff8014580d>{hrtimer_nanosleep+130}
Oct 9 01:06:09 lqportdb1 kernel: <ffffffff8010a883>{error_exit+0}
.
.
Oct 9 01:06:09 lqportdb1 kernel: Free swap = 0kB
Oct 9 01:06:09 lqportdb1 kernel: Total swap = 4194296kB
Oct 9 01:06:09 lqportdb1 kernel: Free swap: 0kB
Oct 9 01:06:09 lqportdb1 kernel: 2099200 pages of RAM
Oct 9 01:06:09 lqportdb1 kernel: 41113 reserved pages
Oct 9 01:06:09 lqportdb1 kernel: 69027 pages shared
Oct 9 01:06:09 lqportdb1 kernel: 191 pages swap cached
Oct 9 01:06:09 lqportdb1 kernel: Out of Memory: Kill process 1121 (db2syscr) score 79429 and children.
Oct 9 01:06:09 lqportdb1 kernel: Out of memory: Killed process 1126 (db2sysc).
-------------------------------------------------------------------------------------------------------------------------------------------------------

Resolving the problem

The Linux OOM-Killer is the cause of the DB2 problem as described above. One step in the resolution is to have a Linux system administrator review the system memory usage and verify that there is available memory, including disk swap space. Most Linux kernels now allow for the tuning of the OOM-killer. It is recommended that a Linux system administrator perform a review and determine the appropriate settings.

Document information

More support for: DB2 for Linux, UNIX and Windows
Database Objects/Config - Instance

Software version: 9.7, 9.8, 10.1, 10.5

Operating system(s): Linux

Reference #: 1449871

Modified date: 15 January 2013


Translate this page: