IBM Support

IBM Spectrum Scale Active File Management (AFM) and AFM Asynchronous Disaster Recovery (ADR) issues which may result in undetected data corruption

Flashes (Alerts)


Abstract

IBM has identified certain issues affecting Active File Management (AFM) and AFM Asynchronous Disaster Recovery (ADR) in IBM Spectrum Scale which may result in undetected data corruption.

Content

IBM has identified certain issues affecting Active File Management (AFM) and AFM Asynchronous Disaster Recovery (ADR) in IBM Spectrum Scale which may result in undetected data corruption.

1. AFM may intermittently read files from the home cluster incorrectly if the replication factor is more than one at the cache cluster, which may result in undetected data corruption.

Problem Summary:
As a result of incorrect calculation of the number of data blocks allocated to a file when the data replication factor is more than one, AFM caches the file without reading the whole file. Applications may fail after reading the whole file from the home cluster, or undetected data corruption may occur. In the latter case, data read by applications may be corrupted (possibly reading all zeros), and no error will be returned by the system call used to read the data.

Users affected:
1. AFM caching is running on GPFS V4.1.0.0 thru 4.1.0.8, or IBM Spectrum Scale V4.1.1.0 thru V4.1.1.19, or V4.2.0.0 thru V4.2.3.8; or V5.0.0.0 thru V5.0.0.2, and
2. The cache file system data replication factor is more than one.

Recommendations:
- Any user meeting both conditions should either upgrade to a level of code containing the fix, or obtain and apply an efix for their level of code by contacting IBM Service:

Users running GPFS V4.1.0.0 thru V4.1.0.8 or IBM Spectrum Scale V4.1.1.0 thru V4.1.1.19, should apply IBM Spectrum Scale V4.1.1.20 or later, available from Fix Central at: http://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.1.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06264

Users running IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8 should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.2&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06269.

Users running IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2 should apply IBM Spectrum Scale V5.0.1.0 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06307.

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance.

2. AFM cache may incorrectly read an HSM migrated file from the home cluster due to the incorrect calculation of the file sparseness information, potentially resulting in undetected data corruption.

Problem Summary:
AFM queries sparsity information of a file from the home cluster before reading the file to read exactly the same number of blocks and make it a sparse file at the cache. Since the number of data blocks allocated is zero for a fully migrated file, AFM will in that case skip reading the file.

Users affected:
Users may be affected when all of the following conditions are met:
1. AFM caching is running on IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2; and
2. HSM is enabled at the home and the file is migrated; and
3. The home cluster is enabled for AFM (the mmafmconfig command was executed); and
4. The afmReadSparseThreshold file configuration parameter is enabled, and the file size exceeds the value of the afmReadSparseThreshold configuration parameter.

Recommendations:
- Any customer meeting all of these conditions should either upgrade to a level of code for which a PTF is available, or obtain and apply an efix for their level of code by contacting IBM Service:

Users running IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2 should apply IBM Spectrum Scale V5.0.1.0 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06307.

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance.

3. AFM mmafmctl Device resync/failover and AFM ADR mmafmctl Device changeSecondary commands may miss copying data to the home or secondary cluster (from the other cluster) when the in-memory queue is dropped with pending in-place writes.

Problem Summary:
AFM sets the home file mtime using the latest mtime from the cache during the replication for each write chunk. After queueing multiple in-place writes on the same file, the in-memory queue could get dropped due to node failure or some replication error after one of the writes has set the latest mtime. If the administrator runs AFM mmafmctl Device resync/failover or executes the AFM ADR mmafmctl Device changeSecondary command after the queue is dropped, AFM may not copy modified data to the target (home or secondary) site because file mtime, filesize and number of data blocks allocated matches between cache (or primary) and home (or secondary).

Users affected:
Users may be affected when both of the following conditions are met:
1. AFM caching is running on IBM Spectrum Scale V4.2.0.0 thru V4.2.0.4; or V4.2.1.0 thru V4.2.1.2; or V4.2.2.0 thru V4.2.2.3; or V4.2.3.0 thru V4.2.3.8; or V5.0.0.0 thru V5.0.0.2, and
2. The AFM mmafmctl Device resync/failover or AFM ADR mmafmctl Device changeSecondary command was executed after the
a. Application is performing in-place writes when the AFM gateway node failed, or
b. Application is performing in-place writes when the AFM in-memory queue was dropped (indicated by message in mmfs log).

Recommendations:
- Any customer meeting all of these conditions should either upgrade to a level of code containing the fix, or obtain and apply an efix for their level of code by contacting IBM Service:

Users running IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8 should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.2&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06269.

Users running IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2 should apply IBM Spectrum Scale V5.0.1.0 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06307.

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance.

4. AFM Asynchronous Disaster Recovery (ADR) could cause some files to be missing from the RPO snapshot at the secondary if recovery was run from the recovery+RPO snapshot.

Problem Summary:
The AFM ADR Primary mode fileset may run recovery based on the RPO snapshot if RPO snapshots are enabled on the fileset. If files are deleted in the live file system after creating the recovery+RPO snapshot, writes are dropped, causing some files to be missing from the RPO snapshot from the secondary. As a result, on any failover, the file will be missing or the data will be incomplete from the RPO snapshot.

Users affected:
Users may be affected when all of the following conditions are met:
1. AFM ADR is running in IBM Spectrum Scale V4.2.0.0 thru V4.2.0.4; or V4.2.1.0 thru V4.2.1.2; or V4.2.2.0 thru V4.2.2.3; or V4.2.3.0 thru V4.2.3.8; or V5.0.0.0 thru V5.0.0.2, and
2. RPO snapshots are enabled on the fileset, and
3. Recovery was run using the RPO snapshot (indicated by message in mmfs log).

Recommendations:
- Any customer meeting these conditions should either upgrade to a level of code containing the fix, or obtain and apply an efix for their level of code by contacting IBM Service:

Users running IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8 should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.2&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06269.

Users running IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2 should apply IBM Spectrum Scale V5.0.1.0 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06307.

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance.

5. AFM may not replicate the data when the dm_write_invis() API is used to write. In addition, the dm_read_invis() API may read incorrect data if the file is not already cached.

Problem Summary:
AFM does not send any requests to the gateway node when using the dm_read_invis() or dm_write_invis() APIs to read or write data. This may cause a data mismatch between cache (or primary) and home (or secondary). This issue does not affect HSM applications.

Users affected:
1. AFM caching running on GPFS V4.1.0.0 thru 4.1.0.8, or IBM Spectrum Scale V4.1.1.0 thru V4.1.1.19, or V4.2.0.0 thru V4.2.0.4; or V4.2.1.0 thru V4.2.1.2; or V4.2.2.0 thru V4.2.2.3; or V4.2.3.0 thru V4.2.3.8; or V5.0.0.0 thru V5.0.0.2, and
2. An application is using dm_read_invis() or dm_write_invis() API on AFM filesets.

Recommendations:
- Any user meeting both conditions should either upgrade to a level of code containing the fix, or obtain and apply an efix for their level of code by contacting IBM Service:

Users running GPFS V4.1.0.0 thru V4.1.0.8 or IBM Spectrum Scale V4.1.1.0 thru V4.1.1.19, should apply IBM Spectrum Scale V4.1.1.20 or later, available from Fix Central at: http://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.1.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06264

Users running IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8 should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.2&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06269.

Users running IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2 should apply IBM Spectrum Scale V5.0.1.0 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all, or contact IBM Service to obtain and apply an efix, reference APAR IJ06307.

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"4.1.1;4.2.0;4.2.1;4.2.2;4.2.3;5.0.0","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.0;4.5;5.0;5.1;5.2;5.3","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 September 2022

UID

ibm10713675