IBM Support

IC74901: IN DPF, CONNECT OR CONNECT RESET HANGS DUE TO MISSING REPLY AFTER NODE FAILURE.

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • During failed connect (which does implicit connect reset), or
    connect reset processing in a multi-node environment, if a node
    failure occurs, an expected reply from a remote node to the
    connect reset can be missed. The coordinator agent will hang in
    the following stack:
    
    sqloWaitEDUWaitPost
    WaitRecvReady
    ReceiveBuffer
    getNextBuffer
    sqlkd_rcv_buffer
    sqlkd_rcv_get_next_buffer
    sqlkd_rcv_init
    sqlkdReceiveReply
    sqleReceiveAndMergeReplies
    sqlkdInterrupt
    sqleDssStopUsing
    ForwardStopRequest
    AppStopUsing
    sqlesrspWrp
    sqleUCagentConnectReset
    sqljsCleanup
    sqljsDrdaAsInnerDriver
    sqljsDrdaAsDriver
    RunEDU
    
    A log should be made in the db2diag.log on the coord node
    similar to:
    
    2011-03-02-04.15.40.706078+540 I601932A472        LEVEL: Error
    PID     : 4841666              TID  : 4885        PROC : db2sysc
    1
    INSTANCE: db2inst              NODE : 001         DB   : P64816
    APPHDL  : 1-51                 APPID: *N1.dpfv971.110301191344
    AUTHID  : DB2INST
    EDUID   : 4885                 EDUNAME: db2agent (sample) 1
    FUNCTION: DB2 UDB, buffer dist serv, sqlkdReceiveReply, probe:10
    RETCODE : ZRC=0x81590016=-2124873706=SQLKF_NODE_FAILED "Node
    Recovery"
    
    
    Another indication of this hang is seeing one or more subagents
    for the stop using coord, stuck in log term sync, on a
    non-coord node with this callstack:
    
    sqloWaitEDUWaitPost
    WaitRecvReady
    ReceiveBuffer
    getNextBuffer
    sqlkd_rcv_buffer
    sqlkd_rcv_get_next_buffer
    sqlkd_rcv_init
    sqlkdReceiveReply
    sqlpLSrequestor
    sqlpPerformTermLogSync
    sqlpTermLogSync
    sqlpterm
    CleanDB
    TermDbConnect
    AppStopUsing
    sqleSubAgentStopUsing
    sqleSubRequestRouter
    
    As a result of the hang problem, a connection attempt to the
    node will fail with SQL1229N.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * DPF users                                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error description field for more information.            *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to DB2 Version 9.7 Fix Pack 5 or later.              *
    ****************************************************************
    

Problem conclusion

  • Problem was first fixed in DB2 Version 9.7 FixPack 5.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC74901

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    970

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2011-03-08

  • Closed date

    2012-02-13

  • Last modified date

    2012-02-13

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IC75211

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • R970 PSN

       UP



Document information

More support for: DB2 for Linux, UNIX and Windows

Software version: 9.7

Reference #: IC74901

Modified date: 13 February 2012