APAR status
Closed as program error.
Error description
Spectrum Scale Erasure code edition interacts with third party software/hardware APIs for internal disk enclosure management.If the management interface becomes degraded and starts to hang commands in the kernel, the hang may also block communication handling threads.This causes a node to fail to renew its lease, causing it to be fenced off from the rest of the cluster. This may lead to additional outages. A previous APAR was issued for this in 5.1.4, but that fix was incomplete.
Local fix
The node with hardware problems will show waiters 'Until NSPDServer discovery completes.'It is recommended to reboot those nodes with those GPFS waiters exceeding 2 minutes if this node is also being expelled.
Problem summary
Spectrum Scale Erasure code edition interacts with third party software/hardware APIs for internal disk enclosure management.If the management interface becomes degraded and starts to hang commands in the kernel, the hang may also block communication handling threads.This causes a node to fail to renew its lease, causing it to be fenced off from the rest of the cluster. This may lead to additional outages. A previous APAR was issued for this in 5.1.4, but that fix was incomplete.
Problem conclusion
This problem is fixed in 5.1.2.15 To see all Spectrum Scale APARs and their respective Fix solutions refer to page: https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Code was further reworked to break a lock ordering dependency that tightly coupled the RPC handling mechanism to the storage backend management software. Degradation of back-end storage management no longer causes node expels. Work Around: The node with hardware problems will show waiters 'Until NSPDServer discovery completes.'It is recommended to reboot those nodes with those GPFS waiters exceeding 2 minutes if this node is also being expelled. Problem trigger: Degradation in back-end storage management that causes commands to hang in the kernel. Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters Platforms affected: Linux Only Functional Area affected: ESS/GNR Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ49543
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
512
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-12-14
Closed date
2023-12-14
Last modified date
2023-12-14
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
15 December 2023