IBM Support

IT24028: DB2 LOG REPLAY (RECOVERY, RFWD, HADR) MIGHT HANG WITH DB2REDOM IN SQLPRGETFREEQE->WAIT AND DB2REDOWS IN SQLPRFINDQUEUE->WAIT.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

APAR status

  • Closed as program error.

Error description

  • Under rare conditions, typically with a long sequence
    (thousands) of single-record transactions without a commit, that
    has to be replayed, Db2 log replay might hang with all EDUs
    ending up in a wait state. Log replay scenarios are:
    - crash recovery
    - rollforward
    - HADR replication
    
    In case of crash recovery, "db2pd -recovery" and "list
    utilities" will indicate an ongoing recovery, but "completed
    work" will not move forward. Stacks from EDUs involved in
    recovery will show the recovery master (db2redom) in:
    
    sqloWaitInterrupt
    sqloWaitEDUWaitPost
    sqlprGetFreeQE
    sqlpPRecReadLog
    sqlpParallelRecovery
    
    and all recovery workers (db2redow) in:
    sqloWaitInterrupt
    sqloWaitEDUWaitPost
    sqlprFindQueue
    sqlpPRecProcLog
    sqlpParallelRecovery
    sqleSubCoordProcessRequest
    The same EDUs will be involved in the remaining scenarios
    (rollforward and HADR).
    
    
    Condition leading to the hang is very likely to cause the
    recovery master to grow the transaction table, which will
    trigger a message from db2redom in db2diag.log similar to this
    one:
    2018-02-01-12.00.00.850000+060 I179497F539          LEVEL: Info
    PID     : 5092                 TID : 4488           PROC :
    db2syscs
    INSTANCE: DB2                  NODE : 000           DB   :
    SAMPLE
    APPHDL  : 0-7                  APPID: *LOCAL.DB2.180201115810
    AUTHID  : db2inst1             HOSTNAME: db2host
    EDUID   : 4488                 EDUNAME: db2redom (SAMPLE) 0
    FUNCTION: DB2 UDB, data protection services, sqlptintMore,
    probe:701
    DATA #1 : <preformatted>
    Current usable transaction entries are 14463 on log stream 0.
    

Local fix

  • Problem is related to the internal logic of work parallelization
    during the recovery, which depends on the number of recovery
    worker EDUs (db2redow). By default number of them is calculated
    based on the number of CPUs. In case of a hang like this, one
    can try to force Db2 to use a higher number of recovery workers
    using DB2BPVARS:
    $ echo "PREC_NUM_AGENTS=64" > db2bpvars.cfg
    $ db2set DB2BPVARS=$(pwd)/db2bpvars.cfg
    and see if that allows recovery to complete. Setting requires
    instance restart to be applied and should be cleared once
    problem is fixed.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * ALL                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * The complete fix for this problem first appears in DB2       *
    * Version 11.1.3.3 iFix001 and all the subsequent Fix Packs.   *
    ****************************************************************
    

Problem conclusion

  • The complete fix for this problem first appears in DB2 Version
    11.1.3.3 iFix001 and all the subsequent Fix Packs.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT24028

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-02-12

  • Closed date

    2018-05-22

  • Last modified date

    2018-05-22

  • APAR is sysrouted FROM one or more of the following:

    IT24027

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels



Document information

More support for: DB2 for Linux, UNIX and Windows

Software version: B10

Reference #: IT24028

Modified date: 22 May 2018


Translate this page: