II13538: DEALING WITH HUNG COUPLING FACILITY CONNECTIONS IXLCONN REASON CODE 02010C27 0C27 02010C09

APAR status

Closed as canceled.

Error description

DEALING WITH HUNG COUPLING FACILITY CONNECTIONS
-----------------------------------------------
When a DB2 member abnormally terminates, its connections
to the coupling facility structures are put into a
FAILING state by cross-system extended services for z/OS
(XES). The FAILING DB2 member remains in this state until
all surviving members of the group have responded to the
XES Disconnected/Failed Connection (DiscFailConn) event for
each structure. XES sends this event to each surviving
member of the group so that the necessary recovery actions
can be taken in response to the failed member.

After all surviving members of the group perform the
necessary recovery actions and provide a response to XES
for the DiscFailConn event for a given CF structure, XES
changes the failed DB2 member's connection status for that
CF structure from FAILING to FAILED PERSISTENT. The DB2
member can reconnect to the CF structure on restart when the
member's status is FAILED PERSISTENT.

When you restart the DB2 member immediately following a
connection failure, it can attempt to reconnect to a CF
structure while its connection is still in a FAILING state.
If this occurs, XES denies the reconnect request with a 0C27
reason code. DB2 responds to this by entering a connection
retry loop until the connection succeeds or until it reaches
the maximum retry count.
IXL013I IXLCONN REQUEST FOR STRUCTURE DSNxxxx_LOCK1 FAILED.
  JOBNAME: xxxxIRLM ASID: 0055 CONNECTOR NAME: DXRDBPG$$xxxx002
  IXLCONN RETURN CODE: 0000000C,  REASON CODE: 02010C27
is message you will see when system was started before the
old structure was completely cleaned up.

For the SCA, the maximum retry count is 200 times with a 3
second interval between each attempt. For the GBPs, the
maximum retry count is 5 times with a 10 second interval
between each attempt.

When joining the datasharing group, IRLM will connect with
current established name protocols which allow it to reusea
failed persistent connection.  Because recovery for the failed
connection may NOT be complete, it may get an 0C27 return code
on the CONNECT.

IRLM will tolerate the 0C27 return code on the first CONNECT
attempt only.  It will change the connection name slightly and
try again to CONNECT.  Since the join to XCF was already
successfully done, IRLM will not disconnect from XCF when it
changes the name.  As a result, the XCF group name will no
longer match the lock structure group name.

If the second attempt to join the group also gets an 0C27
return code from XES, IRLM will stop the group initialization
process and DENY the DBMS identify.

**IMPORTANT NOTE**
If IRLM connects to the structure as a result of RSN0C27, there
will be a FAILED PERSISTANT CONNECTION for this member with the
ORIGINAL CONNECTION ID.  Failed persistent connections will be
deleted by a new IRLM joining the group if there are no RLE
associated with the connection and it is no longer needed.
Users should ** NOT ** be deleting these FAILED PERSISTANT
connections.

 msgIXL030I CONNECTOR STATISTICS
 msgIXL031I CONNECTOR CLEANUP ... FOR CONNECTOR n HAS
      COMPLETED
Can be seen on either an IRLM joining or leaving the group.

You may notice a message similar to the following message,
which indicates a failed connection attempt:


IXL013I IXLCONN REQUEST FOR STRUCTURE DB2GR0W_SCA FAILED.
     JOBNAME: DB2VMSTR ASID: 05E1 CONNECTION NAME: DB2_DB2V
     IXLCONN RETURN CODE: 0000000C,   REASON CODE: 02010C27

The preceding message might be displayed multiple times
while DB2 is in a connection retry loop. This is normal.

In rare cases, one or more of the surviving members of a
group will encounter difficulties in providing the
DiscFailConn response to XES for a given CF structure. XES
issues a message similar to the following message for each
DB2 member that it does not receive a response from within
two minutes:

IXL041I CONNECTOR NAME:DB2_DB2M, JOBNAME:DB2MMSTR, ASID:0086

  HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION
  EVENT FOR SUBJECT CONNECTION: DB2_DB2V.
  DISCONNECT/FAILURE PROCESS FOR STRUCTURE DB2GR0W_SCA CANNOT
  CONTINUE.
  MONITORING FOR RESPONSE STARTED: 08/08/2002 23:50:23.
  DIAG: 0000 0000 00000000

In extreme cases, the maximum number of connection retries
will be reached. If encountered for the SCA, this prevents
the failed member from restarting and DB2 issues a message
similar to the following message:

DSN7506A  -DB2V DSN7LSTK
CONNECTION TO THE SCA STRUCTURE DB2GR0W_SCA FAILED.
 MVS IXLCONN RETURN CODE = 0000000C,
 MVS IXLCONN REASON CODE = 02010C27.

IRLM will try over and over again to connect and these msgs
 will be seen -
 DXR133I xxxx002 TIMEOUT DURING GLOBAL INITIALIZATION WAITING
        FOR aaaa
 Eventually IRLM will get
  DXR122E xxxx013 ABEND UNDER IRLM TCB/SRB IN MODULE DXRRL732
         ABEND CODE=U2025 ( U2025 )

Operator actions for dealing with hung CF connections
-------------------------------------------------------
Preform the following actions to recover from hung CF
structure connections.

1. Save a dump of all DB2 and IRLM members along with
   SDATA=(COUPLE,XESDATA) so IBM Software Support can
   determine what is causing the hung connections. See
   message II10850 for more information.

2. Attempt a rebuild of the lock structure. Sometimes
   the SCA rebuild process is suspended on an IRLM
   lock requst, and there's a chance that a rebuild
   of the lock structure can shake loose a stalled lock
   request and clear the condition that is causing the
   DiscFailConn response to hang. If the Rebuild of the lock
   structure works, XES issues a message similar to the
   following message for each group member as it provides
   the required DiscFailConn response:

IXL043I CONNECTION NAME: DB2_DB2M, JOBNAME: DB2MMSTR,
   ASID: 0086 HAS PROVIDED THE REQUIRED RESPONSE. THE
   REQUIRED RESPONSE FOR THE DISCONNECTED/FAILED CONNECTION
   EVENT FOR SUBJECT CONNECTION DB2_DB2V,
   STRUCTURE DB2GR0W_SCA IS NO LONGER EXPECTED

   If the Rebuild does not work, proceed to step 3.

3. Issue the D XCF,STR,STRNM=<strnmae>,CONNM=<conname>
   command for the structure/connector that is in the
   FAILING state. Alternatively, issue the D XCF,STR,STRNM=
   <strname>,CONNM=ALL command.

   If this command identifies the unresponsive members, skip
   to Step 6. If it does not identify the unresponsive
   members, proceed to Step 4.

4. Attempt a structure Rebuild for the affected structure,
   if you have not already done this.

5. If the Rebuild hangs, issue the D XCF,STR,STRNM=<strname>
   command to identify the unresponsive connector.

   This will identify the members that are unresponsive to
   the Rebuild. These members are probably the same members
   that are unresponsive to the DiscFailConn event.

6. Cancel/recycle the unresponsive members. The STOP DB2
   command might not work because internal DB2 processes
   are hung, so issue MODIFY irlmproc,ABEND command to
   bring down IRLM, or cancel IRLM and DB2 MSTR.

   As each member terminates, ensure that XES issues message
   IXL043I to indicate that it no longer expects a
   DiscFailConn response from that member. When all members
   that owe responses have been stopped, all connections to
   the SCA connections go away.

7. Issue the D XCF,STR,STRNM=<sca>,CONNM=ALL command to
   verify the status of the connections to SCA.

8. Restart all DB2 members with FAILED PERSISTENT
   connections.
   As each member successfully reconnects to the SCA, XES
   issues message IXL014I. If a problem still exists,
   proceed to Step 9.

9. Take down/restart the systems on which the unresponsive
   members are running. If restarting the system does not
   fix the unresponsive members, proceed to Step 10.

10. Cancel/recycle all connections to CF structures. If a
    problem still exists, proceed to Step 11.

11. Take down/restart all systems.


Refer to z/OS Recovery and Reconfiguration Guide for more inform
Other references:
- MVS apar OW46531 closing text give additional info on proper
  Handling of 'FAILING' CF structure connections.
- MVS doc apar OW29300.

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

```
can
Informational apar
```

APAR Information

APAR number
II13538
Reported component name
PB LIB INFO ITE
Reported component ID
INFOPBLIB
Reported release
001
Status
CLOSED CAN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2003-03-20
Closed date
2005-01-05
Last modified date
2006-04-26

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEPEK","label":"Db2 for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
26 April 2006

Tips

II13538: DEALING WITH HUNG COUPLING FACILITY CONNECTIONS IXLCONN REASON CODE 02010C27 0C27 02010C09

Subscribe

APAR status

Closed as canceled.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

Document Information

Share your feedback

Need support?