IBM Support

IBM Spectrum Scale (GPFS) AFM incorrectly replicates data when write and truncate operations are interleaved

Flashes (Alerts)


Abstract

IBM has identified an issue with AFM in IBM GPFS (V3.5.0.0 through V3.5.0.34, or V4.1.0.0 through V4.1.0.8), or IBM Spectrum Scale (V4.1.1.0 through V4.1.1.16, or V4.2.0.0 through V4.2.3.4) levels where AFM might not transfer write operations completely when a file is truncated. This may cause a data mismatch between cache (or primary) and home (or secondary). This issue may result in undetected data corruption at home (or secondary).

Content


Problem Summary:

AFM merges write operations on the same file or creates a list of all writes for the same file if ranges cannot be merged, to minimize the number of write operations to be queued at a gateway node. When a file is truncated, AFM might not transfer write operations completely. Write operations from the cache (or primary) may not get properly reflected at home (or secondary). This may result in silent data corruption at home (or secondary) as outdated data could be read. Applications may then get incorrect data after failover to secondary or if the cache (same cache if the file is evicted, or different cache) reads the file from home, which would lose the updates reflected only in cache. There will not be any error messages in logs, or errors returned to the application, when this problem occurs. This problem happens when a file is written and truncated in a particular order, within a small time window. For example, a write to a file at a larger offset, followed by a write at a smaller offset, followed by truncating the file size to a size smaller than the first write offset.

Write 1MB at offset 1MB.
Write 256KB at offset 0.
Truncate to 512K.

When operations are performed in the above order, both write operations are dropped and result in corruption at the home site, if all three operations are performed in a short period of time (the afmAsyncDelay configuration option on the fileset, for which the default value is 15 seconds).

Users affected:

AFM users running IBM GPFS V3.5.0.0 through V3.5.0.34, or V4.1.0.0 through 4.1.0.8, or IBM Spectrum Scale V4.1.1.0 through V4.1.1.16, or V4.2.0.0 through V4.2.3.4. This issue affects all AFM replication modes, including independent-writer, single-writer and DR.

Recommendations:

1. Users running IBM Spectrum Scale V4.2.0.0 through V4.2.3.4, should apply IBM Spectrum Scale V4.2.3.5, available from Fix Central at:
https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.3&platform=All&function=all, or contact IBM Service to obtain and apply the efix for your level of code, reference APAR IV99796.

2. Users running IBM GPFS V4.1.0.0 through V4.1.0.8, or IBM Spectrum Scale V4.1.1.0 through V4.1.1.16, should apply IBM Spectrum Scale V4.1.1.17, available from Fix Central at: https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=All&platform=All&function=all, or contact IBM Service to obtain and apply the efix for your level of code, reference APAR IV99764.

3. Users running IBM GPFS V3.5.0.0 through V3.5.0.34 should upgrade to IBM Spectrum Scale V4.1.1.17, available from Fix Central at: https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=All&platform=All&function=all

4. Verify file checksums between cache (or primary) and home (or secondary), and copy the data for mismatched files from cache (or primary) to home (or secondary).

5. If you believe your IBM Spectrum Scale (GPFS) file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"4.1.1;4.2.0;4.2.1;4.2.2;4.2.3","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 September 2022

UID

ssg1S1010629