IBM Support

IJ38283: SPECTRUM SCALE ERASURE CODE EDITION BACK-END STORAGE MANAGEMENT DEGRADATION CAN CAUSE NODE EXPULSION.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Spectrum Scale Erasure code edition interacts with
    third party software/hardware APIs for
    internal disk enclosure management.
    If the management interface becomes degraded
    and starts to hang commands in the kernel,
    the hang may also block communication
    handling threads.
    This causes a node to fail to renew its lease,
    causing it to be fenced off
    from the rest of the cluster.
    This may lead to additional outages.
    

Local fix

  •   The node with hardware problems will show waiters
    'Until NSPDServer discovery completes.'
    It is recommended to reboot those nodes with those
    GPFS waiters exceeding 2 minutes.
    

Problem summary

  • Spectrum Scale Erasure code edition interacts with
    third party software/hardware APIs for
    internal disk enclosure management.
    If the management interface becomes degraded
    and starts to hang commands in the kernel,
    the hang may also block communication
    handling threads.
    This causes a node to fail to renew its lease,
    causing it to be fenced off
    from the rest of the cluster.
    This may lead to additional outages.
    

Problem conclusion

  • This problem is fixed in 5.1.3  PTF 1
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    Benefits of the solution:
    Degradation of back-end storage management
    no longer causes node expels.
    
    Work Around:
    The node with hardware problems will show waiters
    'Until NSPDServer discovery completes.'
    It is recommended to reboot those nodes with those
    GPFS waiters exceeding 2 minutes.
    
    Problem trigger:
    Degradation in back-end storage management
    that causes commands to hang in the kernel.
    
    Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters
    
    Platforms affected:  Linux Only
    
    Functional Area affected: ESS/GNR
    
    Customer Impact: High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ38283

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    513

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-03-08

  • Closed date

    2022-03-08

  • Last modified date

    2022-03-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"513","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
08 March 2022