IBM Support

OA49474: SRB INCORRECTLY PROMOTED DUE TO LOCK CONTENTION

A fix is available

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • When z/OS (IEAVWUQ: WUQ add) detects a flood of SRBs it can
    start scheduling new SRBs at the scheduler's work unit (WEB)
    dispatching priority.
    
    If the scheduling work unit is currently promoted to dispatching
    priority (DPH) FF for local or CML lock contention as it does
    the schedule, the scheduled work can "inherit" that scheduler's
    promoted WEB DPH FF inadvertently.
    
    If the scheduled work persists as a long running SRB or multiple
    SRBs, the effects of it (them) inheriting DPH FF can be varied.
    Delay for CPU and system stall can be seen if multiple SRBs with
    DPH FF are created.
    
     EXTERNAL SYMPTOMS: System stall.
    
     VERIFICATION STEPS:
     1) From system trace table, note CP on which the SRB in
        question is running.  Presumably this is an SRB that is
        taking more CP than expected.
    
     2) Issue: IPCS CBF LCCAnn   where "nn" is the decimal number
        of the processor that the SRB was running on.  Do: FIND CWEB
        within this LCCA report.
    
     3) Issue: IPCS CBF xxxxxxxx STR(WEB)   where xxxxxxxx is the
        address from LCCACWEB, obtained in step 2.  Verify that the
        WEB contains the following field settings:
    
           CMAJO.... 00FF   CMINO.... 00FF   PCTRL.... 01000000
    
        indicating that the unit of work is running at dispatching
        priority X'FF' and the WebPromotion_SRBsActive bit is ON.
    
     4) If there is a match on these fields, then you have confirmed
        that the unit of work was running at an elevated priority
        due to detected SRB flooding.  This priority should match
        the *normal* priority of the scheduling WEB.  This leads to
        the question of whether the scheduling WEB had a normal
        DP of FF, or whether normally (in the absence of local/CML
        lock promotion) it runs at a lower DP.  To get a probable
        indication of this, if you know what address space the
        scheduling unit of work lives in, check that address space's
        ASCBDPH (IP SUMM FORMAT report for that address space).  If
        the DPH field is less than X'FF', then the scheduling unit
        of work probably normally runs at a priority lower than
        X'FF', in which case this APAR should help.  Otherwise, if
        the scheduling unit of work's priority is normally X'FF',
        then this APAR will not help.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: Users running z/OS HBB7780 and above.        *
    ****************************************************************
    * PROBLEM DESCRIPTION: A system can stall during a flood of    *
    *                      SRBs, because an SRB is incorrectly     *
    *                      promoted to the highest dispatching     *
    *                      priority.                               *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    During a flood of SRBs, Supervisor performs SRB promotion,
    where an SRB will use its scheduler's workunit dispatching
    priority.  If the SRB scheduling workunit got temporarily
    promoted during lock contention, the SRB will incorrectly
    inherit the scheduler's promoted dispatching priority.
    
    If the SRB is a long running SRB, the incorrect inherited
    dispatching priority can cause non-system level work to not
    run, resulting in a stalled system.
    

Problem conclusion

  • Fixed SRB promotion processing to use the SRB scheduling
    workunit's normal dispatching priority (as if it was not
    promoted) if the scheduling workunit dispatching priority
    was temporarily promoted during lock contention.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA49474

  • Reported component name

    SUPERVISOR CONT

  • Reported component ID

    5752SC1C5

  • Reported release

    780

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-12-04

  • Closed date

    2016-01-15

  • Last modified date

    2016-02-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UA80346 UA80347 UA80348

Modules/Macros

  • IEAVEAT0 IEAVEDSR IEAVEGR  IEAVEJST IEAVEMIN
    IEAVEMRQ IEAVESC0 IEAVESVC IEAVINIT IEAVLKRM IEAVMPWQ IEAVSCHA
    IEAVSCHD IEAVWUQA IEAWEBP
    

Fix information

  • Fixed component name

    SUPERVISOR CONT

  • Fixed component ID

    5752SC1C5

Applicable component levels

  • R7A0 PSY UA80346

       UP16/01/27 P F601 Ž

  • R780 PSY UA80347

       UP16/01/27 P F601 Ž

  • R790 PSY UA80348

       UP16/01/27 P F601 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.



Document information

More support for: z/OS family

Software version: 780

Operating system(s): MVS, z/OS

Reference #: OA49474

Modified date: 01 February 2016