Flashes (Alerts)
Abstract
DS8870s attached to a System z host, and using PAVs as described below, and having firmware below the levels identified in the Resolution / Support information, are exposed to potential loss of access / data when PAV base and alias volume addresses are within the same metadata (MD) track (each track supports 32 addresses).
Content
To encounter the impacting event, the system must be running dual node operational state, the trigger must be a failure that causes the loss of a single cluster, resulting in an unexpected cluster failover to the remaining cluster. During the failover, if the following configurations are present loss of access will occur:
- Replication activity (including but not limited to Incremental Flash Copy, Metro Mirror, Global Mirror)
- Base addresses and alias addresses co-existing within the same metadata track (within 32 consecutive address range). Determination methodology and example information is identified below.
How to determine if a DS8870 has a base and alias address within the same MD (metadata) track:
Each LCU on a DS8700 can support 256 addresses which may be either base or PAV alias addresses. For each LCU, these 256 addresses are broken down into address ranges as follows: 00-1F, 20-3F, 40-5F, 60-7F, 80-9F, A0-BF, C0-DF, and E0-FF for the purpose of storing metadata. As long as each address range contains only base addresses or only alias addresses, the DS8870 is not exposed to the issue described above. However, if both base and alias type devices are defined within the same address range, the DS8870 does have base and alias within the same MD (metadata) track.
As an example, we get only 32 address per MD track, if we allocate 64 base address that would consume exactly 2 MD tracks. Then the PAVs would be on separate MD tracks. So we do not have base and alias addresses on the same MD track, and are not exposed to the issue.
The issue is that during the failover, the DS8870 firmware will mark the alias addresses as base devices in the "mixed" address metadata track, and try to stage metadata from them. Since an alias is a pointer to a device, the system panics, taking down the remaining cluster
The exposure to this issue only exists on the DS8870, and cannot be encountered on DS8100, DS8300, DS8700 or DS8800.
Mitigation
Ensure that alias and base volume addresses do not align within the same metadata (MD) track. This will avoid encountering the error.
- If the base device addresses are in increments of 32 (32, 64, 96, 128, 160, 192, 224, 256), or if the base and alias volume addresses are not in the same range (00-1F, 20-3F, 40-5F, 60-7F, 80-9F, A0-BF, C0-DF, or E0-FF) no action is required.
- If not in increments of 32 (24, 56, 77, etc), ensure that alias volume addresses and base volume addresses are not in the same 32 address range.
- If there is a gap of 32 or larger, no action required
- Remove alias addresses:
Static/Dynamic PAV:
The following process needs to be performed:
- Delete all alias addresses in the same 32 address range as base addresses
Or
Delete all aliases
Note: Action to take will be client dependent - Optionally, add the deleted aliases from step (1) above into another 32 address range that contains only alias addresses
- Perform for all LCU's
The following process needs to be performed:
- Delete all alias addresses in the same 32 address range as base addresses
- Optionally, add the deleted aliases from step (1) above into another 32 address range that contains only alias addresses
- Perform for all LCU's
- Issue warmstart using DSCLI cmd (dscli>diagsi -action warmstart IBM.2107–68FA121)
- DSCLI users can obtain the command information here: http://publib.boulder.ibm.com/infocenter/dsichelp/ds8000ic/index.jsp?topic=%2Fcom.ibm.storage.ssic.help.doc%2Ff2c_clidiagsi_2hsk4c.html&resultof=%22warmstart%22
Resolution / Support:
The fix to this issue is now available in DS8870 bundles 87.5.21.0 (release 7.0) and 87.10.89.0 (release 7.1). Clients with DS8870s using parallel access volumes (PAVs) and exposed to this issue should either avoid having base and alias addresses in the same metadata track, relocate the exposed alias addresses as mentioned above, in which case a code update is not required, or arrange with IBM Service to update their firmware to a bundle with the fix at their earliest convenience.
Was this topic helpful?
Document Information
Modified date:
25 September 2022
UID
ssg1S1004395