IBM Support

VM66487: ABENDSXA004 GUEST USE OF SIMULATED AND REAL QDIO DEVICES

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When a virtual machine is configured to exploit one or more
    simulated VSwitch NIC devices for network connectivity and
    also using a dedicated real QDIO device in a multiple vCPU
    virtual I/O configuration, there is the possibility of the
    system terminating with a SXA004 ABEND.  A real dedicated
    QDIO device can either be an OSA-Express, HiperSockets or
    a FCP device.
    A VAP002 abend is also a symptom of this problem.
    Other symptoms can be a slow or unresponsive system.  If a
    SNAPDUMP is taken many calls to HCPVAI+9D2 may exist.
    

Local fix

  • N/A
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All customers using a z/VM VSwitch for guest *
    *                 network connectivity in addition to a        *
    *                 dedicated QDIO capable device such as an OSA *
    *                 Express, HiperSockets, or a FCP Device.      *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION: APPLY PTF                                    *
    ****************************************************************
    z/VM may terminate with an SXA004 ABEND forcing a system IPL to
    recover, when a virtual machine is using both a VSwitch to
    provide network connectivity and real dedicated QDIO Devices
    like OSA-Express, HiperSockets or FCP.
    
    QDIO devices use a special form of I/O Interruption called an
    Adapter Interruption (AI) to present completion events for
    pending networking or SCSI operations.  The main difference
    between a tradition I/O Interruption and an AI is that
    completion events for multiple devices can be presented with
    just a single interruption.  The problem here is in the logic
    which currently merges completion events for multiple devices
    into a single I/O Interruption.  Specifically the merging of
    simulated AIs for a VSwitch device with real hardware generated
    AIs for a dedicated QDIO Device.
    
    The following virtual machine configuration is exposed to this
    issue:
    1. Configured and using multiple virtual processors (MP CONFIG)
    2. One or more virtual NICs coupled to one or multiple VSwitches
    3. One or more dedicated FCP, HiperSockets or OSA-Express
    Devices
    4. Both VSwitch and real QDIO Devices must be transferring data
    

Problem conclusion

  • For a real dedicated QDIO Device, the hardware has the ability
    to reflect an Adapter Interruption (AI) directly to an
    operating system running in a virtual machine without z/VM
    involvement.  For a simulated VSwitch Device, it is z/VM's
    responsibility for generating a simulated AI to report a
    completion event.  Therefore, when running a configuration using
    both real and simulated devices, a hardware generated AI can
    inform the operating system of completion events for both real
    and simulated devices.
    
    In order to generate a simulated AI, z/VM uses a special AI
    Virtual Device Block (AIF VDEV) to serialize the presentation
    of a simulated AI to a virtual machine.  The AIF VDEV is only
    required for a simulated AI, since a real device can pass the
    AI directly to a virtual machine.  The problem at hand is
    directly related to the high frequency of z/VM's need to acquire
    the AIF VDEV Lock to determine whether it can merge a simulated
    AI with an already pending real AI.
    
    When dealing with moderate to high bandwidth data transfers on
    all QDIO devices, it's possible for a large number of
    independent completion event tasks to get queued waiting for an
    exclusive AIF VDEV Lock.  This is necessary to determine
    whether it can architecturally merge the completion event.  If
    the virtual processor z/VM selected to present the interruption
    is slow in processing for any reason while it holds the AIF VDEV
    Lock, the queue of pending tasks can get so long, it can
    exhaust all available memory in the System Execution
    Space; thus causing the system to ABEND.
    
    Additional logic is added to optimize the merging of multiple
    completion events into a single event.  This is accomplished by
    eliminating the need to acquire the AIF VDEV Lock when a
    previous completion event task is already pending to do the
    work.  This will prevent no more than a few tasks ever waiting
    for the lock at any point in time.
    

Temporary fix

  • FOR RELEASE VM/ESACP/ESAR710 :
    PREREQ: VM66302 VM66426
    CO-REQ: NONE
    IF-REQ: NONE
    FOR RELEASE VM/ESA CP/ESA R720 :
    PREREQ: VM66426
    CO-REQ: NONE
    IF-REQ: NONE
    

Comments

APAR Information

  • APAR number

    VM66487

  • Reported component name

    VM CP

  • Reported component ID

    568411202

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-01-08

  • Closed date

    2021-01-12

  • Last modified date

    2023-08-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UM35807 UM35808

Modules/Macros

  • HCPVAI   HCPVIS   HCPVQA
    

Fix information

  • Fixed component name

    VM CP

  • Fixed component ID

    568411202

Applicable component levels

  • R710 PSY UM35807

       UP21/01/12 P 2101 ¢

  • R720 PSY UM35808

       UP21/01/12 P 2101 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG27M"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"710"}]

Document Information

Modified date:
29 August 2023