A fix is available
APAR status
Closed as program error.
Error description
A task has been the subject of a deferred FORCEPURGE while it was in a running state. The running task then actioned this purge as it suspended on a lock manager lock. DFHDSTCB routine SLEEP_CS does this by calling DFHDSDS4 after updating the state of the task to PURGE_PENDING. In parallel, another task (running on its own L8/L9) was issuing a resume as it released the lock which the first task had suspended on. We then get into a race condition. The PURGE logic and the RESUME logic have worked in a fashion which resulted in both PURGE and RESUME placing the same task on the L8/L9's dispatchable chain. The suspending task relinquishes control to the dispatcher. DFHDSTCB SLEEP_CS (running under the default DSTCB housekeeping task for the running TCB) discovers there is a purge pending which is actionable (this has to be FORCEPURGE as the suspend was from lock manager which prohibits normal purges). The PURGE_STATUS of the task is updated to PURGE_PENDING using CS and DFHDSDS4 ?PURGE is invoked. This calls SUSPEND_TOKENS_PURGE (as the task has issued DSSR SUSPEND). This detects that the suspend token used on the suspend is in a SUSPENDED state. We now execute code to set the state of the suspend token to PURGED using CS. However, just before this CS operation, there must be a DSSR RESUME from the task releasing the lock. This executes code in DFHDSSR RESUME_TASK_PROC. This also discovers the suspend token is in a SUSPENDED state. This code also updates the state of the suspend token - from SUSPENDED to RESET. However, this code doesn't use CS ! The CS operation in DFHDSDS4 must have worked and the update in DFHDSSR must have followed. DFHDSSR failed to detect that the suspend token state had changed underneath it's feet. DFHDSSR will now call WAKE_TASK. This in turn drives DFHDSWKT. As things stand right at this moment, the task's PURGE_STATUS is PURGE_PENDING. DFHDSWKT has code to detect this and stop RESUME processing from adding the task to the dispatchable chain (by setting RETC=1). However, in parallel with the call to DFHDSWKT by RESUME processing, we are continuing to execute code in DFHDSDS4 PURGE logic on the other L8/L9. As DFHDSDS4 believes that the state of the suspend token is now PURGED, it proceeds to update the task's PURGE_STATUS to PURGED. The TASK_STATE is set to DISPATCHABLE and the task is placed on the dispatchable chain. DFHDSWKT now runs (under RESUME logic). The target task is now in a DISPATCHABLE/PURGED state. DFHDSWKT has no code fragment to handle this state so it drops into the OTHERWISE clause at the end of the code. This returns to WAKE_TASK, leaving RETC=0 which signals to WAKE_TASK that it should place the task on the dispatchable chain. So the task is now on the L8/L9's disptachable chain twice. . Additional symptoms: CICS unresponsive, at maxtask with many CWXN transactions. A dump at the time showed no task running on the QR TCB, but the KTCB for the QR TCB was running. Its last stack entry showed it was processing in DFHDSTCB - DOUBLE_CHAIN_SORT_MERGE. The SYSTRACE showed the QR TCB in a tight loop due to a DTA+30 pointing to itself. This was at offsets x'263E' to x'2654' into DFHDSTCB at ptf UK57632. The problem involves timing where a task is suspended and resumed at the same time putting its DTA on the dispatchable when it is already executing. Also in internal trace you will also see trace entries for a task we think is running on the QR TCB, but it is actually not the correct TCB address for the QR. In this case it was an L8 TCB. mxt hang hung stall forcepurge lmqueue
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All CICS users. * **************************************************************** * PROBLEM DESCRIPTION: Abend 0C4 in DFHDSTCB after a deferred * * FORCEPURGE request is processed. * **************************************************************** * RECOMMENDATION: * **************************************************************** A task has been the subject of a deferred FORCEPURGE. After the task suspends, CICS attempts to process the purge. In routine SLEEP_CS, the state of the task is updated to PURGE_PENDING, and DFHDSDS4 is called. In parallel, another task on an open TCB releases the lock on which the first task was waiting, and issues a resume. A race condition now exists between the PURGE and the RESUME logic. This results in a task being placed on its TCB's dispatchable chain twice, and leads to the eventual CICS abend.
Problem conclusion
DFHDSSR has been modified to use compare and swap when modifying the task's resume state. This prevents concurrent tasks affecting each other in this adverse way.
Temporary fix
FIX AVAILABLE BY PTF ONLY
Comments
APAR Information
APAR number
PM57226
Reported component name
CICS TS Z/OS V4
Reported component ID
5655S9700
Reported release
600
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-02-01
Closed date
2012-05-08
Last modified date
2013-09-16
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK78710 UK78711 PM97098
Modules/Macros
DFHDSSR
Fix information
Fixed component name
CICS TS Z/OS V4
Fixed component ID
5655S9700
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGMGV","label":"CICS Transaction Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.1","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
16 September 2013