The WebSphere MQ queue manager enters a hang condition where, eventually, the following dump with title is generated by the operating system
Dump Title: END OF MEMORY RESOURCE MANAGER HANG DETECTED: TCB = tcb_address, NAME = IEFJRECM - SC1B6
The above dump is eventual, resulting in queue manager termination. However that might be preceded by symptoms indicated by the messages :
CSQJ111A OUT OF SPACE IN ACTIVE LOG DATA SETS
CSQ3201E ABNORMAL EOT IN PROGRESS FOR USER=
CSQV086E QUEUE MANAGER ABNORMAL TERMINATION
CICS transactions can appear hung and might not be purgeable. Queue manager logs can begin to fill, or become totally full, as CSQJ111A indicates.
IEA794I SVC DUMP HAS CAPTURED: DUMP TITLE=END OF MEMORY RESOURCE MANAGER HANG DETECTED: TCB tcb_address NAME = IEFJRECM - SC1B6
A low priority batch job has been entered into a long swap by the operating system, causing latch contention within the queue manager. The job had been logically swapped out because it was waiting on an HSM backup to complete, which previously held an ENQUEUE on a VTOC volume that the job needed. Behind the HSM backup was another job running a CDS backup to the same volume. The CDS backup prevented any tape activity and prevented the HSM backup from completing, thus creating a deadlock.
Diagnosing the problem
The END OF MEMORY RESOURCE MANAGER dump can be used to determine what threads are active in the queue manager, what latches are held, and which are being waited on. Once the order of waiters is determined, check the output from the command VERBX SRMDATA to see which latch holder is swapped out. This can cause the queue manager to wait on the latch that might never be freed, until the holder is swapped back in by the operating system.
Resolving the problem
In this case HSM required a recycle in order to clear the deadlock; however, long term, processes that are interlocked similarly should be separated out to avoid such a single point of failure.
WMQ WebSphere MQ