IBM Support

II14016: DB2 RA10 RB10 RC10 HANG WAIT SUSPEND LOOP PROBLEM SUMMARY (FOR DB2 DISTRIBUTED HANG/WAIT SEE II08215 + II11164)

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • INTRAN

Error description

  • The best way to get documentation is using parmlib member IEADMC
    The DUMP parmlib avoids the use of SLIP IF (PER) which is
    limited to only one PER  trap per system.
    IEADMCxx, where "xx" is the suffix you specify on the
    PARMLIB= operand of the DUMP command.
    **************************************************************
     The purpose of this APAR is to document the known
    DB2 RA10 RB10 RC10 5740XYR00 HANG-WAIT-
    SUSPEND problems.  For HANG
    or WAIT problems in DB2 DISTRIBUTED asid also see
    II08215.  Take note, if your problem is the failure of DB2 to
    startup,  then remember, DB2 does NOT function without IRLM.
    Verify that your IRLM can function/startup without DB2.  If
    IRLM is in an indeterminate state wherein it cannot IDENTIFY
    the DB2, then DB2 cannot fully startup.  Check your SYSLOG
    for errors related to IRLM or look for DXRxxxx type messages.
      In addition to a current fix-list, a process has been
    provided that indicates what to do when DB2 or an  ALLIED asid
    is hung (or looping).  Please follow this process and have the
    listed documentation available for DB2 SUPPORT analysis.
    

Local fix

  • A new keyword 'service' has been added to the display thread
    command to assist in the diagnosis of DB2 thread hangs.  The
    command is issued as follows:
    -dis thd(*) service(wait)
    This will display all threads that have been suspended for
    2 times the IRLM timeout limit or a minimum of 60 seconds.
    If the thread is suspended due to IRLM resource contention
    or DB2 latch contention, additional information will be
    displayed to assist in identifying the problem.
    WHAT TO DO IF DB2, OR DB2 ALLIED ASID
    IS HUNG REFERENCE SECTION:
    TYPE-of-failure keywords in THE DB2
    DIAGNOSIS REFERENCE GUIDE
    ________________________________________________________
      ALL OF THE BELOW DOCUMENTATION WILL BE NEEDED
     BY DB2 LEVEL2
    ----------------------------------------------------------------
    1) Displays: -DISPLAY THREAD(*) DETAIL
                 D A,ALL   (or  D A,ssnm*  or  D A,IRL*)
                 D GRS,CONTENTION
                 D OPDATA
       (If possible, also execute the following 2 DISPLAY cmds)
                 -DISPLAY DATABASE(*) USE/LOCKS LIMIT(*)
                 -DISPLAY UTILITY (*)
       *NOTE* Be sure to keep the MVS SYSLOG to enable reference
              to the  DISPLAY command's DSNV404I response output.
       *NOTE* A thread STATUS of  PT*  means that the thread is
             in DB2 and using Query CP Parallelism.  (DEGREE ANY)
    ----------------------------------------------------------------
    2) Obtain MVS console dumps using the MVS DUMP command:
       Always include the DB2 subsystem name in your DUMP cmd title.
             DUMP COMM=(DB2P thread 505 hung)
       (a) If you suspect that your WAIT or HANG may be due to
        a LOOP in DB2 on an ALLIED asid, make sure that the MVS
        INTERNAL SYSTEM TRACE is set ON, and that this trace is
        set to a good working DIAGNOSTIC value.  .
        Enter MVS commands:  TRACE ST,999K,BR=OFF
                             TRACE MT,264K
    .
    NOTE: If the system is on Z/OS 1.10 or higher, TRACE ST
          could be nnnM or nG.
    Note: when dealing with problems of this nature, this internal
        system TRACE my be the most important diagnostic item
        captured in the dump.  Unless altered, TRACE default is 64K.
    
       (b) IF AN ALLIED ADDRESS SPACE IS HUNG:
        Determine if the relevant ALLIED asid is getting CPU cycles
        or if the asid is swapped out.
         - IF THIS ASID IS SWAPPED OUT -
     Take CONSOLE dump of the MVS *MASTER* (asid 0001). The order is
     IMPORTANT. The MVS MASTER must be dumped 1st to ensure good
     dump data. After asid(1) is dumped, then dump ALLIED ASID, IRLM
    ssnmMSTR, ssnmDBM1 from all members. Try to capture all in one.
     Use a joblist with wildcards in DUMP cmd to get all members:
       JOBNAME=(*MASTER*,BADjob,XCFAS,ssnmIRLM,ssnmMSTR,ssnmDBM1),
         where ssnm = is the subsystemname of the DB2 members.
     Use REMOTE keyword to gets dumps of all Datasharing members.
     REMOTE=(JOBLIST,SDATA,DSPNAME),DSPNAME=('ssnmIRLM'.*,'XCFAS'.*)
    - IF THIS ASID IS NOT SWAPPED OUT -
      The MVS MASTER (asid 0001) is not required, dump ALLIED,
      ssnmMSTR, ssnmDBM1, XCFAS, and IRLM asids of all members.
    - If D GRS,C command shows an address space is contending
      resource with DB2, dump this address space as the first
      address space to be dumped.
    c) IF DB2 IS HUNG:
      Take a CONSOLE dump of ssnmMSTR, ssnmDBM1, IRLM, & XCFAS .
    SDATA=(RGN,CSA,SQA,LPA,LSQA,SWA,PSA,ALLNUC,XESDATA,TRT,GRSQ,SUM)
        Use MVS command:  D D,OPTIONS  to check SDUMP defaults.
    (if using or considering SLIP, read II10850 and/or PN80921 )
    
    To facilitate the accurate and timely diagnoses of a reported
    problem, it is imperative that the user produce COMPLETE dumps
    of the associated malady. PARTIAL dumps will
    waste valuable time, and usually could be deemed inadequate for
    full problem diagnoses. Always dump the DB2MSTR, DB2DBM1,
    IRLM and other DB2 asids(DISTand SPAS might be needed.
    
    Check for MSGIEA911E message after the DUMP command is issued
    The dump may take a minute or so to complete. When finished, MVS
    will issue the IEA911E message noting the conditon of the dump.
    The condition will either be COMPLETE or PARTIAL.  The message
    can be MSGIEA611I if dump had been allocated through DYNALLOC.
    Another message to be aware of is  MSGIEA043I MAXSPACE REACHED.
    This indicates a PARTIAL dump. At minimum, set DB2 using system
    to a reasonable level, in MVS Commands see these commands:
           DISPLAY  : D D,OPTIONS
           CHNGDUMP : CD SET,SDUMP,TYPE=XMEME,MAXSPACE=8000M
                      MAXSPACE=8000M minimum for DB2 z/OS V8 & V9
    .
                      V10 and above would require 16000M+ MAXSPACE
                      should be set when system parmits. There is
                      still a chance of getting partial dump
                      depending on the customer configuration.
    .
     ||NOTE|| Should you see partial dump with 16G of storage in V10
              contact IBM DB2 SYSTEMS support.
    .
      Please be sure to have z/OS APARs
      OA40015, OA39596/OA41315, OA40856, OA41994
      to avoid the partial dumps in V10 systems.
    .
     Note: See II06471 : DUMPSRV uses AUX storage for dumping, you
           may need to add an extra PAGE dataset when dumping DBM1.
     Note: Allocate a hi-capacity device like a 3390 mod9 for dumps.
    **use ACS routines for size ***
     Note: With DFSMS120 and Dynamic Dump Allocation, multi-volume
           EFDS format datasets can be created for your SVCDUMP.  **
    ----------------------------------------------------------------
    ----------------------------------------------------------------
    3) RECYCLE DB2:
       If the CANCEL of a hung thread is not successful, or if
       DB2 is hung, execute the following commands in the noted
       order until one of the commands accomodates your need:
         (ssnm is the DB2 subsystem name)
    
      A. -STOP DB2 MODE(QUIESCE)
      B. -STOP DB2 MODE(FORCE)
      C. If ssnmDIST is running do MVS command: CANCEL ssnmDIST,A=xx
         or
    
         Modify IRLMPROC with abend using command:
         F IRLMPROC,ABEND,NODUMP
      D. CANCEL ssnmDBM1,A=yy (issue 2 consecutive CANCEL commands)
             If Cancel ssnmDBM1 does not work then -
      E. CANCEL ssnmMSTR,A=zz
        IF cancel ssnmMSTR does not work, then -
       There is always a FORCE ssnmMSTR,ARM to use as noted earlier
       but we recommend avoiding its use.  IRLM can remain in an
       indefinite state and you may not be  able to restart DB2
       before an IPL is done. Use the MVS command  FORCE jobname
     as a LAST resort.  This FORCE command may need to be issued
       several times before the wanted job finally terminates with
       MSGIEF404I.  OEM products like RESOLVE and KILL can be used
       inlieu of this MVS FORCE command.  Use MVS display commands
       (D A,ALL) to verify that the DB2/IRLM STCjob and ASIDs are
       no longer active to MVS.
    ----------------------------------------------------------------
    4) Use the MVS command  SETDMN  to verify  DOMAIN parameters.
       If MAX and MIN are set TOO low then the DB2 subsystem will
       not stop and start cleanly, ie:   SETDMN MAX=200,MIN=255
    ----------------------------------------------------------------
    ----------------------------------------------------------------
    5) OBTAIN SYS1.LOGREC:
       Use IFCEREP1 service aid to obtain DETAILed software event
       records for at least 1 hour prior to the error of note.
       You may find it beneficial to first run a HISTORY report.
       Note: If DB2 is FORCEd down, or abends in some way, expect
             to see MVS CROSS-Memory errors like S0D5 S0D6 S0D7
             and S058 S0E0.  There may also be several TASK term
             S13E errors logged in LOGREC.  Do NOT interpret any
             of these secondary recoveries as the source of your
             DB2 subsystem outage concern.  DB2 generated SOFT
             CANCEL entries like rc00E50013 may also be issued.
    ----------------------------------------------------------------
    6) If it appears that IRLM is hung, DB2 will most likely be hung
       along with associated DB2 jobs (threads). It may be necessary
       to obtain IRLM doc to diagnose the hang.  This is especially
       critical in a DataSharing environment. Get a console dump of
    DB2MSTR,DB2DBM1 And IRLM ASIDS or see slip info II10850
         Run with IRLM component traces active
    . --------------------------------------------------------------
    7) Check the PSP upgrades for HIPER fixes associated with
       your DB2 release (UPGRADEs :DB2810 DB2710 DB2610 )
       To prevent unexpected DB2 outages caused by a WAIT / HANG /
       LOOP / SUSPEND, the DB2 SUPPORT TEAM highly recommends that
       the following APARS be applied to your ESA or OS390 SYSTEM:
    ----------   Maintenance/Info Apar-------------------------
    II06310           DUMPSRV info plus fixlist (II06226)
    II05402           CANCEL command fails to complete
    II10817 DB2 R610 R710 R810 STORAGE USAGE FIXLIST
    e-Support web site:   http://www-3.ibm.com/software/data/db2/os3
    Technical information is categorized so you can navigate directl
    COMMENTS:
    CLOSED FOR DB2INFO RETENTION:  See II04309 for DB2 storage info
     and more DB2 diagnostic setup.
    ________________________________________________________________
    

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    II14016

  • Reported component name

    PB LIB INFO ITE

  • Reported component ID

    INFOPBLIB

  • Reported release

    001

  • Status

    INTRAN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2005-03-29

  • Closed date

  • Last modified date

    2018-03-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEPEK","label":"Db2 for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
29 March 2018