Fast I/O Failure and dynamic tracking interaction
Although Fast I/O Failure and dynamic tracking of Fibre Channel (FC) devices are technically separate features, the enabling of one can change the interpretation of the other in certain situations. The following table shows the behavior exhibited by the FC drivers with the various permutations of these settings:
dyntrk | fc_err_recov | FC Driver Behavior |
---|---|---|
no | delayed_fail |
The default setting. This is legacy behavior existing in previous versions of AIX®. The FC drivers do not recover if the SCSI ID of a device changes, and I/Os take longer to fail when a link loss occurs between a remote storage port and switch. This might be preferable in single-path situations if dynamic tracking support is not a requirement. |
no | fast_fail |
If the driver receives a RSCN from the switch, this could indicate a link loss between a remote storage port and switch. After an initial 15-second delay, the FC drivers query to see if the device is on the fabric. If not, I/Os are flushed back by the adapter. Future retries or new I/Os fail immediately if the device is still not on the fabric. If the FC drivers detect that the device is on the fabric but the SCSI ID has changed, the FC device drivers do not recover, and the I/Os fail with PERM errors. |
yes | delayed_fail |
If the driver receives a RSCN from the switch, this could indicate a link loss between a remote storage port and switch. After an initial 15-second delay, the FC drivers query to see if the device is on the fabric. If not, I/Os are flushed back by the adapter. Future retries or new I/Os fail immediately if the device is still not on the fabric, although the storage driver (disk, tape, FastT) drivers might inject a small delay (2-5 seconds) between I/O retries. If the FC drivers detect that the device is on the fabric but the SCSI ID has changed, the FC device drivers reroute traffic to the new SCSI ID. |
yes | fast_fail |
If the driver receives a Registered State Change Notification (RSCN) from the switch, this could indicate a link loss between a remote storage port and switch. After an initial 15-second delay, the FC drivers query to see if the device is on the fabric. If not, I/Os are flushed back by the adapter. Future retries or new I/Os fail immediately if the device is still not on the fabric. The storage driver (disk, tape, FastT) will likely not delay between retries. If the FC drivers detect the device is on the fabric but the SCSI ID has changed, the FC device drivers reroute traffic to the new SCSI ID. |
When dynamic tracking is disabled, there is a marked difference
between the delayed_fail
and fast_fail
settings
of the fc_err_recov attribute. However, with dynamic tracking
enabled, the setting of the fc_err_recov attribute is less
significant. This is because there is some overlap in the dynamic
tracking and fast fail error-recovery policies. Therefore, enabling
dynamic tracking inherently enables some of the fast fail logic.
The general error recovery procedure when a device is no longer
reachable on the fabric is the same for both fc_err_recov
settings
with dynamic tracking enabled. The minor difference is that the storage
drivers can choose to inject delays between I/O retries if fc_err_recov is
set to delayed_fail
. This increases the I/O failure
time by an additional amount, depending on the delay value and number
of retries, before permanently failing the I/O. With high I/O traffic,
however, the difference between delayed_fail
and fast_fail
might
be more noticeable.
SAN administrators might want to experiment with these settings to find the correct combination of settings for their environment.