APAR status
Closed as canceled.
Error description
DEALING WITH HUNG COUPLING FACILITY CONNECTIONS ----------------------------------------------- When a DB2 member abnormally terminates, its connections to the coupling facility structures are put into a FAILING state by cross-system extended services for z/OS (XES). The FAILING DB2 member remains in this state until all surviving members of the group have responded to the XES Disconnected/Failed Connection (DiscFailConn) event for each structure. XES sends this event to each surviving member of the group so that the necessary recovery actions can be taken in response to the failed member. After all surviving members of the group perform the necessary recovery actions and provide a response to XES for the DiscFailConn event for a given CF structure, XES changes the failed DB2 member's connection status for that CF structure from FAILING to FAILED PERSISTENT. The DB2 member can reconnect to the CF structure on restart when the member's status is FAILED PERSISTENT. When you restart the DB2 member immediately following a connection failure, it can attempt to reconnect to a CF structure while its connection is still in a FAILING state. If this occurs, XES denies the reconnect request with a 0C27 reason code. DB2 responds to this by entering a connection retry loop until the connection succeeds or until it reaches the maximum retry count. IXL013I IXLCONN REQUEST FOR STRUCTURE DSNxxxx_LOCK1 FAILED. JOBNAME: xxxxIRLM ASID: 0055 CONNECTOR NAME: DXRDBPG$$xxxx002 IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C27 is message you will see when system was started before the old structure was completely cleaned up. For the SCA, the maximum retry count is 200 times with a 3 second interval between each attempt. For the GBPs, the maximum retry count is 5 times with a 10 second interval between each attempt. When joining the datasharing group, IRLM will connect with current established name protocols which allow it to reusea failed persistent connection. Because recovery for the failed connection may NOT be complete, it may get an 0C27 return code on the CONNECT. IRLM will tolerate the 0C27 return code on the first CONNECT attempt only. It will change the connection name slightly and try again to CONNECT. Since the join to XCF was already successfully done, IRLM will not disconnect from XCF when it changes the name. As a result, the XCF group name will no longer match the lock structure group name. If the second attempt to join the group also gets an 0C27 return code from XES, IRLM will stop the group initialization process and DENY the DBMS identify. **IMPORTANT NOTE** If IRLM connects to the structure as a result of RSN0C27, there will be a FAILED PERSISTANT CONNECTION for this member with the ORIGINAL CONNECTION ID. Failed persistent connections will be deleted by a new IRLM joining the group if there are no RLE associated with the connection and it is no longer needed. Users should ** NOT ** be deleting these FAILED PERSISTANT connections. msgIXL030I CONNECTOR STATISTICS msgIXL031I CONNECTOR CLEANUP ... FOR CONNECTOR n HAS COMPLETED Can be seen on either an IRLM joining or leaving the group. You may notice a message similar to the following message, which indicates a failed connection attempt: IXL013I IXLCONN REQUEST FOR STRUCTURE DB2GR0W_SCA FAILED. JOBNAME: DB2VMSTR ASID: 05E1 CONNECTION NAME: DB2_DB2V IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C27 The preceding message might be displayed multiple times while DB2 is in a connection retry loop. This is normal. In rare cases, one or more of the surviving members of a group will encounter difficulties in providing the DiscFailConn response to XES for a given CF structure. XES issues a message similar to the following message for each DB2 member that it does not receive a response from within two minutes: IXL041I CONNECTOR NAME:DB2_DB2M, JOBNAME:DB2MMSTR, ASID:0086 HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT FOR SUBJECT CONNECTION: DB2_DB2V. DISCONNECT/FAILURE PROCESS FOR STRUCTURE DB2GR0W_SCA CANNOT CONTINUE. MONITORING FOR RESPONSE STARTED: 08/08/2002 23:50:23. DIAG: 0000 0000 00000000 In extreme cases, the maximum number of connection retries will be reached. If encountered for the SCA, this prevents the failed member from restarting and DB2 issues a message similar to the following message: DSN7506A -DB2V DSN7LSTK CONNECTION TO THE SCA STRUCTURE DB2GR0W_SCA FAILED. MVS IXLCONN RETURN CODE = 0000000C, MVS IXLCONN REASON CODE = 02010C27. IRLM will try over and over again to connect and these msgs will be seen - DXR133I xxxx002 TIMEOUT DURING GLOBAL INITIALIZATION WAITING FOR aaaa Eventually IRLM will get DXR122E xxxx013 ABEND UNDER IRLM TCB/SRB IN MODULE DXRRL732 ABEND CODE=U2025 ( U2025 ) Operator actions for dealing with hung CF connections ------------------------------------------------------- Preform the following actions to recover from hung CF structure connections. 1. Save a dump of all DB2 and IRLM members along with SDATA=(COUPLE,XESDATA) so IBM Software Support can determine what is causing the hung connections. See message II10850 for more information. 2. Attempt a rebuild of the lock structure. Sometimes the SCA rebuild process is suspended on an IRLM lock requst, and there's a chance that a rebuild of the lock structure can shake loose a stalled lock request and clear the condition that is causing the DiscFailConn response to hang. If the Rebuild of the lock structure works, XES issues a message similar to the following message for each group member as it provides the required DiscFailConn response: IXL043I CONNECTION NAME: DB2_DB2M, JOBNAME: DB2MMSTR, ASID: 0086 HAS PROVIDED THE REQUIRED RESPONSE. THE REQUIRED RESPONSE FOR THE DISCONNECTED/FAILED CONNECTION EVENT FOR SUBJECT CONNECTION DB2_DB2V, STRUCTURE DB2GR0W_SCA IS NO LONGER EXPECTED If the Rebuild does not work, proceed to step 3. 3. Issue the D XCF,STR,STRNM=<strnmae>,CONNM=<conname> command for the structure/connector that is in the FAILING state. Alternatively, issue the D XCF,STR,STRNM= <strname>,CONNM=ALL command. If this command identifies the unresponsive members, skip to Step 6. If it does not identify the unresponsive members, proceed to Step 4. 4. Attempt a structure Rebuild for the affected structure, if you have not already done this. 5. If the Rebuild hangs, issue the D XCF,STR,STRNM=<strname> command to identify the unresponsive connector. This will identify the members that are unresponsive to the Rebuild. These members are probably the same members that are unresponsive to the DiscFailConn event. 6. Cancel/recycle the unresponsive members. The STOP DB2 command might not work because internal DB2 processes are hung, so issue MODIFY irlmproc,ABEND command to bring down IRLM, or cancel IRLM and DB2 MSTR. As each member terminates, ensure that XES issues message IXL043I to indicate that it no longer expects a DiscFailConn response from that member. When all members that owe responses have been stopped, all connections to the SCA connections go away. 7. Issue the D XCF,STR,STRNM=<sca>,CONNM=ALL command to verify the status of the connections to SCA. 8. Restart all DB2 members with FAILED PERSISTENT connections. As each member successfully reconnects to the SCA, XES issues message IXL014I. If a problem still exists, proceed to Step 9. 9. Take down/restart the systems on which the unresponsive members are running. If restarting the system does not fix the unresponsive members, proceed to Step 10. 10. Cancel/recycle all connections to CF structures. If a problem still exists, proceed to Step 11. 11. Take down/restart all systems. Refer to z/OS Recovery and Reconfiguration Guide for more inform Other references: - MVS apar OW46531 closing text give additional info on proper Handling of 'FAILING' CF structure connections. - MVS doc apar OW29300.
Local fix
Problem summary
Problem conclusion
Temporary fix
Comments
can Informational apar
APAR Information
APAR number
II13538
Reported component name
PB LIB INFO ITE
Reported component ID
INFOPBLIB
Reported release
001
Status
CLOSED CAN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2003-03-20
Closed date
2005-01-05
Last modified date
2006-04-26
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
[{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEPEK","label":"Db2 for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
26 April 2006