IBM Support

IC71654: TAKEOVER HADR COMMAND HANGS UP ON STANDBY WHEN A TRAP HAS BEEN PREVIOUSLY SUSTAINED IN PRIMARY DATABASE

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • The hang problem occurs if a takeover is issued on an HADR
    Standby when the HADR Primary has previously sustained a trap.
    
      On the HADR Standby: the takeover command will hang, and other
    commands such as 'db2stop force' will either hang or not work.
    
      On the HADR Primary: clients will be unable to connect.
    
      If the HADR Primary has previously sustained a trap, you will
    be able to see:
        1) ADM14012C or ADM14013C messages in the administration
    notification log ({instance_name}.nfy)
        AND
        2) A suspended db2agent in 'db2pd -EDUs' output.
    
      And even after you apply APAR IC69960 fix, the takeover
    command will get into hang on the conditions above.
    
      The takeover command fails on the condition above with the
    Severe error messages like ADM14013C in db2diag.log of primary,
    which indicate the db2agents had been suspended in primary like
    below.
    
      2010-09-27-14.35.38.415495+540 I1781400A564  LEVEL: Severe
      PID     : 1577038         TID  : 11054       PROC : db2sysc 0
      INSTANCE: db2inst1        NODE : 000         DB   : TESTDB
      APPHDL  : 0-367           APPID:
    10.219.61.1.64526.100927053458
      AUTHID  : DB2INST1
      EDUID   : 11054           EDUNAME: db2agent (TESTDB) 0
      FUNCTION: DB2 UDB, RAS/PD component,
    pdResilienceIsSafeToSustain, probe:800
      DATA #1 : String, 37 bytes
      Trap Sustainability Criteria Checking
      DATA #2 : Hex integer, 8 bytes
      0x0000000000021000
      DATA #3 : Boolean, 1 bytes
      true
    
      ...
    
      2010-09-27-14.35.38.625896+540 E1813735A941  LEVEL: Severe
      PID     : 1577038         TID  : 11054       PROC : db2sysc 0
      INSTANCE: db2inst1        NODE : 000         DB   : TESTDB
      APPHDL  : 0-367           APPID:
    10.219.61.1.64526.100927053458
      AUTHID  : DB2INST1
      EDUID   : 11054           EDUNAME: db2agent (TESTDB) 0
    (suspended) 0
      FUNCTION: DB2 UDB, DRDA Application Server,
    sqljsTrapResilience, probe:800
      MESSAGE : ADM14013C  The following type of critical error
    occurred: "Trap".
              This error occurred because one or more threads that
    are associated
              with the current DB2 instance have been suspended, but
    the instance
              process is still running. First Occurrence Data
    Capture (FODC) was
              invoked in the following mode: "Automatic". FODC
    diagnostic
              information is located in the following directory:
              "/var/log/db2/FODC_Trap_2010-09-27-14.35.38.031284/".
    
      For more information on sustained traps, see:
    
      * Enhanced resilience to errors and traps reduces outages
    
    http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t
    opic=/com.ibm.db2.luw.wn.doc/doc/c0054512.html
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * "takeover hadr" command hangs up when a trap has been        *
    * sustained.                                                   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to db2 Version 9.7 FixPak 4                          *
    ****************************************************************
    

Problem conclusion

  • Problem was the first fixed in Version 9.7 FixPak 4
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC71654

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    970

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-10-04

  • Closed date

    2011-05-09

  • Last modified date

    2011-05-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • R970 PSN

       UP



Document information

More support for: DB2 for Linux, UNIX and Windows

Software version: 9.7

Reference #: IC71654

Modified date: 09 May 2011