APAR status
Closed as program error.
Error description
ABSTRACT: cacache.<cluster> needs to be refreshed if subnets is changed Error Description: if subnets is defined, gpfs will cache the IP in the subnets in /var/mmfs/gen/cacache* files. So next time when a node needs to talk a remote cluster, it will use the cached IP in cacache* file. But when subnets definition is changed, cacache* file won't be refreshed so the obsoleted cached IP will still be used. Reported in: Spectrum Scale 5.1.1.4 Known Impact: subnets can't take effect for remote cluster communication and the obsolete IP is still used. Verification steps: change subnets, and check /var/mmfs/gen/cacache* file Recovery action: remove /var/mmfs/gen/cacache* files manually and restart GPFS. Local Fix: remove /var/mmfs/gen/cacache* files before change the subnets
Local fix
Problem summary
Currently, there is no machanism to cleanup the subnets contact IPA caches. If the subnets configuration changes and the cached IPA does not work any more, the nodes may not be able to communicate with each others.
Problem conclusion
This problem is fixed in 5.1.2 PTF 8 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Automatically cleanup stale subnets cache files when the subnets configuration is changed. Work Around: Manually cleanup the stale /var/mmfs/gen/cacache.* files Problem trigger: Normally, GPFS will use daemon IP address for communication, but if the cluster want to use other IP address for communication, they must configure "subnets" configuration. Then GPFS will use "subnets" IP address for daemon communication. But we need to do following: - In probing cluster stage, a pair of nodes use daemon IP addresses for communication. - After the connection is established, pairs of nodes exchange their "subnets" IP addresses - Close the connection which is using daemon IP addresses - Establish new connection which is using "subnets" IP addresses So, once the "subnets" IP addresses are cached, GPFS uses these cached IP for communication. The problem occurs when cach "subnets" IP addresses are no longer communicative. Even a new "subnets" is configured or "subnets" is removed, we cannot use the original "subnets" to exchange the new IP address which the customer wants to use. Symptom: Cluster/File System Outage Platforms affected: ALL Operating System environments Functional Area affected: subnets/remote cluster Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ41370
Reported component name
SPEC SCALE ADV
Reported component ID
5737F35AP
Reported release
511
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-07-27
Closed date
2022-11-03
Last modified date
2022-11-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE ADV
Fixed component ID
5737F35AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"511","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
03 November 2022