IBM Support

IJ41370: CACACHE.<CLUSTER> NEEDS TO BE REFRESHED IF SUBNETS IS CHANGED

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • ABSTRACT:
    cacache.<cluster> needs to be refreshed if subnets is
    changed
    
    Error Description:
    
    if subnets is defined, gpfs will cache the IP in the
    subnets in /var/mmfs/gen/cacache*
     files. So next time when a node needs to talk a remote
    cluster, it will use the
    cached IP in cacache* file. But when subnets definition
    is changed, cacache* file won't
     be refreshed so the obsoleted cached IP will still be
    used.
    
    Reported in:
    Spectrum Scale 5.1.1.4
    
    Known Impact:
    subnets can't take effect for remote cluster
    communication and the obsolete IP is still used.
    
    Verification steps:
    change subnets, and check /var/mmfs/gen/cacache* file
    
    Recovery action:
    remove /var/mmfs/gen/cacache* files manually and restart
    GPFS.
    
    Local Fix:
    remove /var/mmfs/gen/cacache* files before change the
    subnets
    

Local fix

Problem summary

  • Currently, there is no machanism to cleanup the subnets
    contact IPA caches.  If the subnets configuration changes
    and the cached IPA does not work any more, the nodes
    may not be able to communicate
    with each others.
    

Problem conclusion

  • This problem is fixed in 5.1.2 PTF 8
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    Benefits of the solution:
    Automatically cleanup stale subnets cache files
    when the subnets configuration is changed.
    
    Work Around:
    Manually cleanup the stale /var/mmfs/gen/cacache.* files
    Problem trigger:
    Normally, GPFS will use daemon IP address for
    communication, but if the cluster want to use other
    IP address for communication, they must configure "subnets"
    configuration. Then GPFS will use "subnets" IP address
    for daemon communication. But we need to do following:
     - In probing cluster stage, a pair of nodes
    use daemon IP addresses for communication.
     - After the connection is established, pairs of nodes
    exchange their "subnets" IP addresses
     - Close the connection which is using daemon IP addresses
     - Establish new connection which is using
    "subnets" IP addresses
    So, once the "subnets" IP addresses are cached,
    GPFS uses these cached IP for communication.
    The problem occurs when cach "subnets" IP addresses are
    no longer communicative. Even a new "subnets" is configured
    or "subnets" is removed, we cannot use the original "subnets"
    to exchange the new IP address which the customer wants to use.
    Symptom:
    Cluster/File System Outage
    Platforms affected:
    ALL Operating System environments
    Functional Area affected:
    subnets/remote cluster
    Customer Impact:
    High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ41370

  • Reported component name

    SPEC SCALE ADV

  • Reported component ID

    5737F35AP

  • Reported release

    511

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-07-27

  • Closed date

    2022-11-03

  • Last modified date

    2022-11-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE ADV

  • Fixed component ID

    5737F35AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"511","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
03 November 2022