IBM Support

VM66357: VSWITCH DEADLOCK AFTER A SERIES OF CONTROLLER STALLS

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A series of VSWITCH controller stalls may result in a deadlock
    (blocking network activity for that VSWITCH).
    
    Problems with the OSA hardware or CP system resources may cause
    VSWITCH controller stalls (on users DTCVSW*) which are normally
    resolved automatically. In some cases, the stall recovery hangs
    and prevents the controller from restoring connectivity for this
    VSWITCH.
    

Local fix

  • Apply PTF
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All customers using a z/VM VSwitch for guest *
    *                 network connectivity.                        *
    *                                                              *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION: APPLY PTF                                    *
    ****************************************************************
    z/VM may become unresponsive, requiring the system to be IPLed
    for recovery, when exploiting a VSwitch for virtual machine
    network connectivity and multiple VSwitch Controller Stalls
    occurs.
    
    Problems with the OSA hardware or CP system resources may cause
    a VSWITCH controller stall (on users DTCVSW*), which are
    normally resolved automatically.  In some cases, the stall
    recovery could hang preventing the controller from restoring
    external network connectivity for all VSwitches.  Once this
    issue is encountered, any virtual machine device configuration
    change, including LOGON/LOGOFF will cause the virtual machine to
    hang up.
    

Problem conclusion

  • z/VM's recovery for a VSwitch Controller Stall, automatically
    detaches all networking devices from the stalled controller and
    attaches them to a another functional controller.  When multiple
    controllers stall, this results in an excessive number of
    ATTACH/DETACH operations to occur concurrently.  It is this
    activity which exposes a Network and I/O Lock hierarchy issue,
    resulting in a deadlock requiring a system IPL to recover.
    
    The VSwitch recovery logic executing the DETACH/ATTACH
    operations is not using Console Function Mode (CFM)
    serialization required by the I/O Subsystem to serialize this
    processing.  This results in a deadlock between the I/O VM
    Configuration Lock and the networking Switch Eligible Table Lock
    (SLMSWLCK).
    
    The VSwitch Recovery logic is modified to use CFM serialization
    when performing both a DETACH/ATTACH operation.
    

Temporary fix

Comments

  • ×**** AE21/03/29 FIX IN ERROR. SEE APAR VM66509  FOR DESCRIPTION
    

APAR Information

  • APAR number

    VM66357

  • Reported component name

    VM CP

  • Reported component ID

    568411202

  • Reported release

    640

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-01-08

  • Closed date

    2020-03-10

  • Last modified date

    2021-06-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UM35609 UM35610 UM35611

Modules/Macros

  • HCPDTD   HCPIQR   HCPLAN   HCPMES   HCPMESA  HCPMESB  HCPMXRBK
    HCPSWC   HCPSWI   HCP2832E
    

Fix information

  • Fixed component name

    VM CP

  • Fixed component ID

    568411202

Applicable component levels

  • RA64 PSY UM35812

       UP21/02/17 I 1000 ¢

  • R640 PSY UM35610

       UP20/03/19 I 1000 ¢

  • R710 PSY UM35611

       UP20/03/19 P 2101 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG27M"},"Platform":[{"code":"PF054","label":"z\/OS"}],"Version":"640","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]

Document Information

Modified date:
30 June 2021