A fix is available
APAR status
Closed as program error.
Error description
The problem is that a database quiesce fails because an IMS goes down, leaving their databases stuck in a database quiesce state, where the only way to get them out of that state is to recycle RMs and IMSs. The V12 base code will be retro-fitted to V11
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: IMS V11 users of IMSplex-wide * * processes using RM, such as database * * quiesce, global online change, and * * UPD IMS SET(PLEXPARM) command * **************************************************************** * PROBLEM DESCRIPTION: An UPDATE DB STOP(QUIESCE) command * * fails with reason code 4124 after a * * previous UPDATE DB START(QUIESCE) * * command failed because an IMS * * came down. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** If an RM client participating in a global process step terminates normally or abnormally, or its SCI goes down in the middle of the process step, the process step hangs until it times out. IMS functions that use RM to coordinate IMSplex-wide processes are never returned a response. The IMS functions that use the RM IMSplex-wide process function include: - database quiesce (UPD DB START or STOP QUIESCE command) - global & ACBLIB member online change (INITIATE OLC command) - UPD IMS SET(PLEXPARM) command The process step timeout value is set to be the same as the timeout value specified on the command that initiated the RM IMSplex-wide process, so the process step could time out long after the RM client or its SCI went down. In the case of database quiesce, IMS databases are left in a database quiesce state, with no way to remove the quiesce state other than recycling the RMs and IMSs and scratching the resource structure.
Problem conclusion
GEN: KEYWORDS: *** END IMS KEYWORDS *** RM needs to be changed to be more usable in this situation. RM should return a non-zero completion code for the RM client that went down right away, rather than waiting for the process step to time out. This would enable the RM client to take action, such as terminating the IMSplex-wide process or cleaning up on behalf of the failed client. This would also improve IMS availability, because the command would fail as soon as an IMS, RM, or SCI failed, freeing IMS up to do other work. Database quiesce needs to be changed to detect that the command failed because an IMS or RM terminated or is not reachable through SCI, and terminate the database quiesce process and remove databases from the database quiesce state. This fix does not address the case where none of the RMs in the IMSplex are available after the database quiesce process is started. This fix does not address the case where the database quiesce command master goes down in the middle of a database quiesce command. CSLRAWDR adds new AWE function for client not reachable. CSLRAWXI adds equates for the new AWE process functions to CSLRPR20, to cleanup processes for a member that terminated that is involved in IMSplex-wide processes. CSLRCODE adds client not reachable RM code. CSLRPRSR adds bits to indicate member terminated normally, member terminated abnormally, and member not reachable through SCI. CSLRPR20 adds logic to handle new functions for member terminated normally, member terminated abnormally, and member not reachable through SCI, by providing a response to an IMSplex-wide process that is in progress, rather than waiting for the process to time out. CSLRPR30 adds logic to set new completion codes for the cases where the member terminated normally, the member terminated abnormally, and the member is not reachable. CSLRREG0 adds code to enqueue an AWE to CSLRPR20 to check if a terminated member is participating in an IMSplex-wide process, so that a response can be issued on that client's behalf. This enables the client, such as global online change or database quiesce to take action and clean up. CSLRRR adds new reason codes for member abended, member shutdown, and member not reachable through SCI. CSLRTM10 is recompiled for the CSLRAWDR change. CSLRTM20 is recompiled for the CSLRAWDR change. CSLRXNF0 adds a check for member not reachable, in order to set a unique function code for CSLRREG0 to differentiate between a member that terminated normally, a member that terminated abnormally, and a member that is not reachable through SCI. DFSCCTX0 adds new completion code text for member abended, member shutdown, and member is not reachable through SCI. DFSCMDRR adds new IMS completion codes for member abended, member shutdown, and member is not reachable through SCI. DFSDBQ00 adds logic to handle the new RM reason codes by cleaning up the database quiesce process and converting the new RM reason codes into IMS reason codes. DFSGPM10 adds logic to handle the new RM reason codes by converting them into IMS reason codes. DFSOLC00 adds logic to handle the new RM completion codes by converting them to IMS reason codes. ********************** * DOCUMENTATION * ********************** These documentation changes are needed in subsequent IMS releases as well. IMS V11 Commands, Volume 1: IMS Commands A - M SC19-2430-02 INITIATE OLC command Add the following completion codes: 14A Command failed for the IMS because its SCI is not active so that the IMS is not reachable. 14B Command failed for the IMS because it shut down normally during command processing. 14C Command failed for the IMS because it terminated abnormally during command processing. IMS V11 Commands, Volume 2: IMS Commands N - V SC19-2431-02 TERMINATE OLC command Add the following completion codes: 14A Command failed for the IMS because its SCI is not active so that the IMS is not reachable. 14B Command failed for the IMS because it shut down normally during command processing. 14C Command failed for the IMS because it terminated abnormally during command processing. UPDATE DB command Add the following usage notes: When a nonzero return code is received for the UPDATE DB START(QUIESCE) with OPTION(HOLD) command or the UPDATE DB STOP(QUIESCE) command, you may need to correct the problem reported by the IMS reason code/completion code and try the command again. Add the following completion codes: 14A Command failed for the IMS because its SCI is not active so that the IMS is not reachable. 14B Command failed for the IMS because it shut down normally during database quiesce processing. 14C Command failed for the IMS because it terminated abnormally during database quiesce processing. UPDATE IMS command Add the following completion codes: 14A Command failed for the IMS because its SCI is not active so that the IMS is not reachable. 14B Command failed for the IMS because it shut down normally during command processing. 14C Command failed for the IMS because it terminated abnormally during command processing. ********************** * AUTOMATION * ********************** If the UPD DB START(QUIESCE), INITIATE OLC, TERMINATE OLC, or UPD IMS SET(PLEXPARM()) command is issued and one of the participants goes down or its SCI goes down, the command will now fail with new completion codes 14A, 14B, or 14C, rather than the command timing out. Users may want to change their automation to handle these new completion codes.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PM74532
Reported component name
IMS V11
Reported component ID
5635A0200
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2012-10-08
Closed date
2012-12-03
Last modified date
2013-01-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK83958
Modules/Macros
CSLRCODE CSLRPR20 CSLRPR30 CSLRREG0 CSLRRR CSLRTM10 CSLRTM20 CSLRXNF0 DFSCCTX0 DFSCMDRR DFSDBQ00 DFSGPM10 DFSOLC00
SC19243002 | SC19243102 |
Fix information
Fixed component name
IMS V11
Fixed component ID
5635A0200
Applicable component levels
R100 PSY UK83958
UP12/12/06 P F212
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 January 2013