IBM Support

II06335: DB2 R710 R610 R510 HANG WAIT SUSPEND LOOP PROBLEM SUMMARY (FOR DB2 DISTRIBUTED HANG/WAIT SEE II08215 + II11164)

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as canceled.

Error description

  • ---============================================---
    = for releases above R610 see APAR II14016 =
    ---============================================---
    (last update 05/23/02 lsk)
    **************************************************************
    The best way to get documention is using parmlib member IEADMCxx
    Dump parmlib support exists in OS/390 R2.5 and up where the
    DUMP commands can be  pre-setup in parmlib members ( IEADMCxx )
    similar to how the SLIP traps  were used in
    the original MVS 5.1.0 timeframe.
    The DUMP parmlib avoids the use of SLIP IF (PER) which is
    limited to only one PER  trap per system.
    IEADMCxx, where "xx" is the suffix you specify on the
    PARMLIB= operand of the DUMP command.
    **************************************************************
       The purpose of this APAR is to document the known DB2 R710
    R610 R510 -  5740XYR00 HANG-WAIT-SUSPEND problems.  For HANG
    or WAIT problems in DB2 DISTRIBUTED asid also see II08215.
       Take note, if your problem is the failure of DB2 to
    startup,  then remember, DB2 does NOT function without IRLM.
    Verify that your IRLM can function/startup without DB2.  If
    IRLM is in an indeterminate state wherein it cannot IDENTIFY
    the DB2, then DB2 cannot fully startup.  Check your SYSLOG
    for errors related to IRLM or look for DXRxxxx type messages.
       In addition to a current fix-list, a process has been
    provided that indicates what to do when DB2 or an  ALLIED asid
    is hung (or looping).  Please follow this process and have the
    listed documentation available for DB2 SUPPORT analysis.
       (**NOTE** DB2 R410 supplies a new CANCEL THREAD command)
    ----------------------------------------------------------------
             WHAT TO DO IF DB2, OR DB2 ALLIED ASID IS HUNG
          (REFERENCE PAGE 80 OF THE DB2 R310 DIAGNOSIS GUIDE)
      ALL OF THE BELOW DOCUMENTATION WILL BE NEEDED BY DB2 LEVEL2
    ----------------------------------------------------------------
    1) Displays: -DISPLAY THREAD(*) DETAIL
       or
       -DISPLAY THREAD(*) SERVICE(WAIT)   with UK02845/UK02846
                 D A,ALL   (or  D A,ssnm*  or  D A,IRL*)
                 D GRS,CONTENTION
                 D OPDATA
       (If possible, also execute the following 3 DISPLAY cmds)
         -dis thd(*) service(wait)
    
    This will display all threads that have been suspended for
    2 times the IRLM timeout limit or a minimum of 60 seconds.
    If the thread is suspended due to IRLM resource contention
     or DB2 latch contention, additional information will be
    displayed to assist in identifying the problem.
               -DISPLAY DATABASE(*) SPACENAM(*) CLAIMERS LIMIT(*)
               -DISPLAY DATABASE(*) SPACENAM(*) USE      LIMIT(*)
               -DISPLAY DATABASE(*) SPACENAM(*) LOCKS    LIMIT(*)
               -DISPLAY UTILITY (*)
       *NOTE* Be sure to keep the MVS SYSLOG to enable reference
              to the  DISPLAY command's DSNV404I response output.
       *NOTE* A thread STATUS of  PT*  means that the thread is
              in DB2 and using Query CP Parallelism.  (DEGREE ANY)
    ----------------------------------------------------------------
    2) Obtain MVS console dumps using the MVS DUMP command:
       Always include the DB2 subsystem name in your DUMP cmd title.
             DUMP COMM=(DB2P thread 505 hung)
       (a) If you suspect that your WAIT or HANG may be due to
        a LOOP in DB2 on an ALLIED asid, make sure that the MVS
        INTERNAL SYSTEM TRACE is set ON, and that this trace is
        set to a good working DIAGNOSTIC value.  See II08023.
        Enter MVS commands:  TRACE ST,128K,BR=OFF
                             TRACE MT,264K
    Note: when dealing with problems of this nature, this internal
        system TRACE my be the most important diagnostic item
        captured in the dump.  Unless altered, TRACE default is 64K.
    
       (b) IF AN ALLIED ADDRESS SPACE IS HUNG:
        Determine if the relevant ALLIED asid is getting CPU cycles
        or if the asid is swapped out.
         - IF THIS ASID IS SWAPPED OUT -
     Take CONSOLE dump of the MVS *MASTER* (asid 0001). The order is
     IMPORTANT. The MVS MASTER must be dumped 1st to ensure good
     dump data. After asid(1) is dumped, then dump ALLIED ASID, IRLM
     ssnmMSTR, ssnmDBM1 from all members. Try to capture all in one.
     Use a joblist with wildcards in DUMP cmd to get all members:
       JOBNAME=(*MASTER*,BADjob,XCFAS,ssnmIRLM,ssnmMSTR,ssnmDBM1),
         where ssnm = is the subsystemname of the DB2 members.
     Use REMOTE keyword to gets dumps of all Datasharing members.
     REMOTE=(JOBLIST,SDATA,DSPNAME),DSPNAME=('ssnmIRLM'.*,'XCFAS'.*)
    - IF THIS ASID IS NOT SWAPPED OUT -
      The MVS MASTER (asid 0001) is not required, dump ALLIED,
      ssnmMSTR, ssnmDBM1, XCFAS, and IRLM asids of all members.
    c) IF DB2 IS HUNG:
      Take a CONSOLE dump of ssnmMSTR, ssnmDBM1, IRLM, & XCFAS .
    SDATA=(RGN,CSA,SQA,LPA,LSQA,SWA,PSA,ALLNUC,XESDATA,TRT,GRSQ,SUM)
        Use MVS command:  D D,OPTIONS  to check SDUMP defaults.
    (if using or considering SLIP, read II10850 and/or PN80921 )
    
    To facilitate the accurate and timely diagnoses of a reported
    problem, it is imperative that the user produce COMPLETE dumps
    of the associated malady. PARTIAL dumps will only add confusion,
    waste valuable time, and usually will be deemed inadequate for
    full problem diagnoses. Always dump the ssnmMSTR asid along with
    IRLM or other DB2 asids. A COMPLETE dump with MSTR is MANDATORY.
    In as such, take note:
         An IBM 3390 mod3 has 3339 cylinders.  DUMPSRV writes the
         dump in 4160 byte records, 686400 bytes per cylinder.  In
         as such, to write a 500 megabyte dump, requires at least
         764 cylinders on this 3390-3 DASD.  Most ssnmDBM1 dumps are
         greater than 500 megabytes.  Check your MVS/ESA manual:
            Planning: Problem Determination and Recovery
         An SVCDUMP must reside on 1 single DASD, has a DSORG=PS
         and this non-VSAM dump dataset can have upto 16 extents.
         So, with MVS/ESA4, it is recommended that user should
         allocate his DUMPxx datasets with a secondary allocation
         value set accordingly.  Example:
            SPACE=(CYL,(900,700),RLSE,CONTIG)
    
    Check for MSGIEA911E message after the DUMP command is issued
    The dump may take a minute or so to complete. When finished, MVS
    will issue the IEA911E message noting the conditon of the dump.
    The condition will either be COMPLETE or PARTIAL.  The message
    can be MSGIEA611I if dump had been allocated through DYNALLOC.
    Another message to be aware of is  MSGIEA043I MAXSPACE REACHED.
    This indicates a PARTIAL dump. At minimum, set DB2 using system
    to a reasonable level, in MVS Commands see these commands:
           DISPLAY  : D D,OPTIONS
           CHNGDUMP : CD SET,SDUMP,TYPE=XMEME,MAXSPACE=16000M
     Note: See II06471 : DUMPSRV uses AUX storage for dumping, you
           may need to add an extra PAGE dataset when dumping DBM1.
     Note: Allocate a hi-capacity device like a 3390 mod9 for dumps.
     Note: With DFSMS120 and Dynamic Dump Allocation, multi-volume
           EFDS format datasets can be created for your SVCDUMP.
           This DFSMS function is the 'RECOMMENDED' capture method.
    ----------------------------------------------------------------
    3) RECYCLE DB2:
       If the CANCEL of a hung thread is not successful, or if
       DB2 is hung, execute the following commands in the noted
       order until one of the commands accomodates your need:
         (ssnm is the DB2 subsystem name)
    
      A. -STOP DB2 MODE(QUIESCE)
      B. -STOP DB2 MODE(FORCE)
      C. If ssnmDIST is running do MVS command: CANCEL ssnmDIST,A=xx
         or
         Modify IRLMPROC with abend using command:
         F IRLMPROC,ABEND,NODUMP
    This will tell IRLM to quit, and remove the IDENTIFY between
    DB2 & IRLM.
      D. CANCEL ssnmDBM1,A=yy (issue 2 consecutive CANCEL commands)
             If Cancel ssnmDBM1 does not work then -
      E. CANCEL ssnmMSTR,A=zz
        IF cancel ssnmMSTR does not work, then -
       There is always a FORCE ssnmMSTR,ARM to use as noted earlier
       but we recommend avoiding its use.  IRLM can remain in an
       indefinite state and you may not be  able to restart DB2
       before an IPL is done. Use the MVS command  FORCE jobname
       as a LAST resort.  This FORCE command may need to be issued
       several times before the wanted job finally terminates with
       MSGIEF404I.  OEM products like RESOLVE and KILL can be used
       inlieu of this MVS FORCE command.  Use MVS display commands
       (D A,ALL) to verify that the DB2/IRLM STCjob and ASIDs are
       no longer active to MVS.
    ----------------------------------------------------------------
    4) Use the MVS command  SETDMN  to verify  DOMAIN parameters.
       If MAX and MIN are set TOO low then the DB2 subsystem will
       not stop and start cleanly, ie:   SETDMN MAX=200,MIN=255
    ----------------------------------------------------------------
    5) OBTAIN SYS1.LOGREC:
       Use IFCEREP1 service aid to obtain DETAILed software event
       records for at least 1 hour prior to the error of note.
       You may find it beneficial to first run a HISTORY report.
       Note: If DB2 is FORCEd down, or abends in some way, expect
             to see MVS CROSS-Memory errors like S0D5 S0D6 S0D7
             and S058 S0E0.  There may also be several TASK term
             S13E errors logged in LOGREC.  Do NOT interpret any
             of these secondary recoveries as the source of your
             DB2 subsystem outage concern.  DB2 generated SOFT
             CANCEL entries like rc00E50013 may also be issued.
    ----------------------------------------------------------------
    6) If it appears that IRLM is hung, DB2 will most likely be hung
       along with associated DB2 jobs (threads). It may be necessary
       to obtain IRLM doc to diagnose the hang.  This is especially
       critical in a DataSharing environment. See II10850 on how
       to obtain doc from all the members of the data sharing group.
          Run with IRLM component traces active. At PN90337 or
       UN98783 specify TRACE=yes in the IRLM startup proc.
             Otherwise start trace with MVS command:
                   TRACE CT,ON,COMP= irlmnm
       (see apar pn01040 and the DB2 Commands manual)
    Issue F irlmproc,STATUS,ALLD  or ALLI to see status of members
    ALL DUMPs can be put on the WEB for Level2 download, read README
    file at:    ftp://testcase.software.IBM.com  ( see II11945 )
    ---------------------------------------------------------------
    02/24/06 rjl
    Abend522 can occur in an allied address space during a call to
    DB2 if the processing of the call goes outside the allied task
    and no other activity occurs in the allied asid for time
    specified in the JWT parameter in the SMFPRMxx parmlib member
    and the time limit is not bypassed.
    .
    Collect and review logrec, syslog and any dumps taken for the
    s522.
    .
    If the documentation for the s522 shows DB2 csect DSNVSR it
    indicates the activity under the allied task was suspended and
    processing is occurring under another task in an address spaces
    other than the allied asid. This is normal operation.
    .
    A slip may be needed to obtain a dump if an error is suspected.
    SLIP SET,C=522,ID=s522,A=SVCD,J=(name of job),
       SDATA=(RGN,CSA,SQA,LPA,LSQA,SWA,PSA,ALLNUC,TRT,GRSQ,SUM),END
    .
    If the csect name DSNX9WCA is present in one of the logrec
    enties for the error sequence it indicates a DB2 stored
    procedure was called.
    ----------------------------------------------------------------
    8) Check the PSP upgrades for HIPER fixes associated with
       your DB2 release (UPGRADEs : DB2710 DB2610 DB2510 )
       To prevent unexpected DB2 outages caused by a WAIT / HANG /
       LOOP / SUSPEND, the DB2 SUPPORT TEAM highly recommends that
       the following APARS be applied to your ESA or OS390 SYSTEM:
               **** End of Documentation Process ****
    -------------------- Non DB2/IRLM Fixlist ---------------------
    See individual apar for ptf required on your system.
    
    -----------2002 ---------------
    OW47911 / UW77371 abend0C4 IGG0CLXE + '6A' or abend878
                       IGG0CLXA + 09E8 DB2 Utilities get rc00E50013
                       abend04e . Fix for FMID= HDZ11F0
    ----------  older maintenance -------------------------
    OY55972                  MVS/ESA/420 and above, performance
    OY65553                  Problem caused by MVS DUMP Services
    OY66146                  DDF REQUESTOR hung, SERVER rc00F30072
    OY64640                  TSO HANG DUE TO TIME LIMIT
    OW07856                  DB2 R410 with DFSMS120 and above
                             improves DB2 shutdown. VSAM MM CONNECT
    OW12181                  DB2 RECOVER OR REORG HANGS OR GETS
                             MSGDSNB224I 0E40 UNIT CHECK
    OW11968                  ACF/VTAM ERR CAUSES PARTIAL DUMPS
                             AND MEMTERM HANGS. OW13090 OW14381
    OW14392                  Parallel Detach(ENQ SYSZDSN3.DSNYALLI)
    OW14416                  S0C1 S0C4 IN IARFP + X'2092' IN R520
    OW17624                  CAS error. DB2 STIMER loop DSN3SSI2
    OW11787                  ESTAE recovery routines skipped
    OW18235                  IEAVESAR errs during RTM process.  LCR
    OW30124                  LowCore Refresh  IEAVESAR logrec entry
    OW19900                  Archive hung msgief238D WAIT NOHOLD
    OW23762                  DUMPSRV recursive S0C4 pic11 IEAVTSSM
    OW25038                  DUMPSRV recursive ABEND0C4 pic11
    II06310                  DUMPSRV info plus fixlist (II06226)
    II05402                  CANCEL command fails to complete
    OW26652                  NO DB2 startup see II04773
    OW28664                  Unpredictable results SLIP PVTMOD
    OW28828                  MSTR enqueue SYSZVOLS (see ow29984)
    OW32069                  OPENMVS DB2 SRBs hung asyncio
    OW31722                  Initiator hung DFSMS130 DSSB BDAM
    OW30322                  Hang   Status Stop SRB (see ow31712)
    OW30546                  S0C4 IECVEXCP unpredictable results
    OW30549                  IEAVTSKT err unpredictable results
    OW31485 UW45995          Agent suspended DSNB5FOR VMM preformat
    OW32616 OS390            WLM Enclave SRB performance
    OW32704                  Invalid Stack (see ow33986) ABEND073
    OW32277                  hangs, SRBtime Enclave (see ow33027)
    OW33628 V2R4 an below    RTM, Detach, Cancel failures
    OW34861                  DSNB1CLM loop, abends. Non CMOS device
    OW38170                  hang, s0c1, s0c4... various symptoms
    OW39670 OS390 V1R3+      ABEND073 rc08 RSM IARUA
    OW39930 OS390 V2R6+      Hangs, DYNALLOC errs rc0210 rc00C200E2
    ------------------------
    CA apar GO77614          STIMERM issued from ACFF7SNQ +1D4
                                hangs DB2 service task PMIOPC01.
    Contact CA support       Wait in csect CASR230D +A6E SRVTSK02
    -------------------------------------------------------------
        See II10348 for DB2 R410 R510  and IRLM R101 fixlist
    ----------------------------------------------------------------
    PQ24904          UQ28184 EDMpool corrupted hangs, loops, abends
    PQ25996          UQ30270 Hang, loop Join with Fetch
    

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

  • CLOSED FOR DB2INFO RETENTION:  See II04309 for DB2 storage info
                                   and more DB2 diagnostic setup.
    

APAR Information

  • APAR number

    II06335

  • Reported component name

    PB LIB INFO ITE

  • Reported component ID

    INFOPBLIB

  • Reported release

    001

  • Status

    CLOSED CAN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    1992-09-11

  • Closed date

    1995-06-21

  • Last modified date

    2014-08-13

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEPEK","label":"Db2 for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
13 December 2020