IBM Support

IJ41651: GPFS NODE OS CRASH: PANIC: "BUG: UNABLE TO HANDLE KERNEL NULL PO

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • GPFS member node os crashed unexpectedly with below
    error:
    
    PANIC: "BUG: unable to handle kernel NULL pointer
    dereference at 000000000000000c"
    
    This issue was found on GPFS 5.1.2.2 version
    

Local fix

Problem summary

  • Linux kernel 4.2 added a new field to the Linux inode
    data structure. When an inode is reused under heavy
    workload, this field might not be initialized correctly,
    leading to a kernel crash when accessing the symlink.
    

Problem conclusion

  • This problem is fixed in 5.1.2 PTF 7
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    Benefits of the solution:
    The solution is to initialize the
    field correct to avoid the crash.
    
    Work Around:
    It is possible to manually patch the GPl layer:
    Edit the file /usr/lpp/mmfs/src/gpl-linux/inode.c
    In function cxiSetOSNode after line: "case S_IFLNK:"
    insert a new line with:
    inodeP->i_link = NULL;
    Run mmbuildgpl again and restart GPFS on the node.
    Problem trigger:
    This problem is highly depended on the workload.
    If there is a workload creating directories, creating
    files underneath directories, deleting directories and
    also creating symlinks, there is a chance that this
    problem is hit. Build systems are a type of software
    that can exhibit this pattern.
    Symptom: Abend/Crash
    Platforms affected: ALL Linux OS environments
    (excluding RHEL7, since that does not include the
    mentioned Linux kernel change)
    Functional Area affected: All Scale Users
    Customer Impact: Critical
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ41651

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    512

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-08-08

  • Closed date

    2022-08-23

  • Last modified date

    2022-08-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
23 August 2022