IBM Support

/usr/lpp/bos/README.FIBRE-CHANNEL from AIX V5.2 Update

  Readme

# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos52F src/bos/kernext/fcp/README.FIBRE-CHANNEL.S 1.3
#
# Licensed Materials - Property of IBM
#
# Restricted Materials of IBM
#
# (C) COPYRIGHT International Business Machines Corp. 2003
# All Rights Reserved
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
#
# IBM_PROLOG_END_TAG
************************** README.FIBRE-CHANNEL ******************************

  * Introduction
  * Fast I/O Failure for Fibre Channel Devices.
  * Dynamic Tracking of Fibre Channel Devices
  * Fast Fail and Dynamic Tracking Interaction
  * Known Hardware Issues

============
Introduction
============

The purpose of this README is to introduce two new features in the Fibre
Channel Device Driver stack, "Fast I/O Failure for Fibre Channel Devices"
and "Dynamic Tracking of Fibre Channel Devices".


==========================================
Fast I/O Failure for Fibre Channel Devices
==========================================

This release of AIX supports Fast I/O Failure for Fibre Channel devices
after link events in a switched environment.  If the FC adapter driver
detects a link event such as a lost link between a storage device and a
switch, the FC adapter driver will wait a short period of time, on the order
of 15 seconds, to allow the fabric to stabilize.  At that point, if the FC
adapter driver detects that the device is not on the fabric, it will begin
failing all I/Os at the adapter driver.   Any new I/O or
future retries of the failed I/Os will be failed immediately by the adapter
until the adapter driver detects that the device has rejoined the fabric.

Fast Failure of I/O is controlled by a new fscsi device attribute,
'fc_err_recov'.  The default setting for this attribute is 'delayed_fail',
which is the I/O failure behavior that has existed in previous versions
of AIX.  Setting this attribute to 'fast_fail', as shown in the example

chdev -l fscsi0 -a fc_err_recov=fast_fail

(assuming the fscsi device instance is fscsi0), enables fast I/O failure.
Fast fail logic is invoked when the adapter driver receives an indication
from the switch that there has been a link event involving a remote storage
device port via a Registered State Change Notification (RSCN) from the switch.

Fast I/O failure may be desirable in situations where multipathing software
is being used.  Setting 'fc_err_recov' to 'fast_fail' may decrease the I/O
fail times due to link loss between the storage device and switch and allow
faster failover to alternate paths.

In single-path configurations, especially configurations with a single-path
to a paging device, the default 'delayed_fail' setting is the
recommended setting.

The requirements for Fast I/O Failure support are the following:

- Fast Fail is only supported in a switched environment.  It is not supported
  in arbitrated loop environments, including public loop.

- FC 6227 adapter firmware - level 3.22A1 or greater.

- FC 6228 adapter firmware - level 3.82A1 or greater.

- FC 6239 adapter firmware - all firmware levels


Failure to meet these requirements may cause the fscsi device to log an
error log of type INFO indicating that one of these requirements was
not met and that fast fail has NOT been enabled.


=========================================
Dynamic Tracking of Fibre Channel Devices
=========================================

This release provides support for Dynamic Tracking of Fibre Channel Devices.
Previous releases of AIX required a user to unconfigure FC storage device and
adapter device instances before making changes on the SAN that might result
in an N_Port ID (SCSI ID) change of any remote storage ports.

If Dynamic Tracking of FC Devices is enabled, the FC adapter driver will
detect when the Fibre Channel N_Port ID of a device changes and then re-route
traffic destined for that device to the new address while the devices are
still online.  Examples of events that can cause an N_Port ID to change are
moving a cable between a switch and storage device from one switch port to
another, connecting two separate switches via an Inter-Switch Link (ISL),
and possibly rebooting a switch.

Dynamic Tracking of FC devices is controlled by a new fscsi device attribute,
'dyntrk'.  The default setting for this attribute is 'no'.  Setting this
attribute to 'yes', as shown in the example

chdev -l fscsi0 -a dyntrk=yes

(assuming fscsi device instance is fscsi0), enables dynamic tracking.
Dynamic Tracking logic is invoked when the adapter driver receives an
indication from the switch that there has been a link event involving a
remote storage device port.

The requirements for Dynamic Tracking support are the same as those for
"Fast I/O Failure for Fibre Channel Devices" and also include the following
requirements:

- Dynamic Tracking requires that device World Wide Name (Port Name) and
  Node Names remain constant, and that World Wide Name be unique.
  Changing the World Wide Name or Node Name of an available or on-line device
  could result in I/O failures.

  In addition, each FC storage device instance must have world_wide_name
  and node_name attributes.  Updated filesets that contain the 'sn_location'
  attribute mentioned in the next bullet should also be updated to contain
  both of these attributes.

- The storage device must provide a reliable method to extract a unique serial
  number for each LUN. The AIX FC device drivers will not autodetect serial
  number location, so the method for serial number extraction must be
  explicitly provided by any storage vendor in order to support
  dynamic tracking for their devices. This information is conveyed to the
  drivers via the 'sn_location' ODM attribute for each storage
  device.  If the disk or tape driver detects that the 'sn_location'
  ODM attribute is missing, an error log of type INFO will be generated and
  dynamic tracking will NOT be enabled.

  Note: The 'sn_location' attribute may be non-displayable, so
  running the 'lsattr' command on an hdisk, for example, may not show
  the attribute, but it may, indeed, be present in ODM.

- The FC device drivers will be able to track devices on a SAN fabric, where
  a SAN fabric is defined as a fabric as seen from a single host bus adapter,
  if the N_Port IDs on the fabric stabilize within about 15 seconds.
  If cables are not reseated or N_Port IDs continue to change after the
  initial 15 seconds, I/O failures could result.

- Devices will not be tracked across host bus adapters. Devices will only
  track if they remain visible from the same HBA that they were originally
  connected to.

  For example, if device A were moved from one location to another on fabric A
  attached to host bus adapter A (i.e., its N_Port on fabric A changes), the
  device would seamlessly be tracked without any user intervention and I/O
  to this device can continue.

  However, if a device A is visible from HBA A but not from HBA B,
  and device A is moved from the fabric attached to HBA A to the fabric
  attached to HBA B, device A will not be accessible on fabric A nor on
  fabric B.  User intervention would be required to make it available on
  fabric B by invoking cfgmgr. The AIX device instance on fabric A would no
  longer be usable, and a new device instance on fabric B would be created.
  This device would have to manually be added to volume groups, multipath
  device instances, etc.   In essence, this is the same as removing a device
  from fabric A and adding a new device to fabric B.

- No dynamic tracking will performed for FC dump devices while an AIX
  system dump is in progress.  In addition, dynamic tracking is not supported
  during boot or during cfgmgr invocations.  SAN changes should not be made
  while any of these operations are in progress.

- Once devices are tracked, ODM will potentially contain stale information as
  SCSI IDs in ODM will no longer reflect actual SCSI IDs on the SAN. ODM will
  remain in this state until cfgmgr is run manually or the system is
  rebooted, provided all drivers, including any third party FC SCSI target
  drivers, are dynamic-tracking capable. If cfgmgr is run manually, cfgmgr
  must be invoked on all affected fscsi devices, which can easily be
  accomplished by running cfgmgr without any options, or by invoking
  cfgmgr on each fscsi device individually.

  Note: Running cfgmgr at run time to recalibrate the SCSI IDs may not
  update the SCSI ID in ODM for a storage device if the storage device is
  currently opened, such as when volume groups are varied on.
  cfgmgr would need to be run on devices that are not opened or the system
  should be rebooted to recalibrate the SCSI IDs.  Note that stale SCSI IDs
  in ODM have no adverse affect on the FC drivers and recalibration of
  SCSI IDs in ODM is not necessary for the FC drivers to function properly.
  Any applications that communicate with the adapter driver directly via
  ioctl calls and use the SCSI ID values from ODM, however, need to be
  updated as indicated in the next bullet to avoid using potentially
  stale SCSI IDs.

- All applications and kernel extensions that communicate with the FC Adapter
  Driver, either via ioctl calls or directly to the FC driver's entry points,
  must support the version 1 ioctl and scsi_buf APIs of the FC Adapter
  Driver in order to work properly with FC dynamic tracking.  Non-compliant
  applications or kernel extensions may not function properly and/or fail
  after a dynamic tracking event.  If the FC adapter driver detects an
  application or kernel extension that is not adhering to the new version 1
  ioctl and/or scsi_buf API, an error log of type INFO will be generated and
  dynamic tracking may not be enabled for the device that this application/
  kernel extension may be trying to communicate with.

  ISVs developing kernel extensions and/or applications that communicate with
  the AIX Fibre Channel Driver stack should refer to the "Fibre Channel
  Protocol for SCSI and iSCSI Subsystem" article in "AIX 5L Version 5.2 Kernel
  Extensions and Device Support Programming Concepts" (pay special attention
  to the "Required FCP and iSCSI Adapter Device Driver ioctl Commands" and
  "Understanding the scsi_buf Structure sections") for changes necessary
  to support Dynamic Tracking.

- Even with dynamic tracking enabled users are strongly encouraged to make
  SAN changes, such as cable moves/swaps and establishing ISL links, during
  maintenance windows. Making SAN changes during full production runs is
  discouraged.  This is due to the fact that there is a short interval of
  time to perform any SAN changes.  Cables that are not reseated correctly,
  for example, could result in I/O failures. Performing these operations
  during a time of little/no traffic minimizes impact of I/O failures due to
  misplugging of cables, taking too long to recable, etc.

At the time of this release, the base AIX FC SCSI Disk and FC SCSI Tape
device drivers support dynamic tracking.

For status on dynamic tracking support for the FAStT product line,
refer to the following URLs:

IBM FAStT Storage Manager v7.10:

    http://www-3.ibm.com/pc/support/site.wss/MIGR-40711.html

IBM FAStT Storage Manager v8.21:

    http://www-3.ibm.com/pc/support/site.wss/MIGR-43839.html

IBM FAStT Storage Manager v8.3:

    http://www-3.ibm.com/pc/support/site.wss/MIGR-50177.html

After accepting the license agreement, proceed to the link for
Storage Manager for AIX.  Look for the current version of this
software and follow the link to the associated README.  Product
support and enhancement announcents will be made here.  These
announcements should contain information on what AIX APARs are
required to support dynamic tracking on FAStT.

In addition, the IBM ESS, EMC Symmetrix and HDS storage devices support
dynamic tracking provided that the vendor provides the ODM filesets with
the necessary 'sn_location' and 'node_name' attributes.  Contact the
storage vendor if you are not sure if your current level of ODM fileset
supports dynamic tracking.

If vendor-specific ODM entries are not being used for the storage device,
but the ESS, Symmetrix or HDS storage subsystem is configured with the
displayable message of "MPIO Other FC SCSI Disk", then dynamic tracking
is supported for these devices in this configuration.

The STK tape device using the standard AIX device driver also supports
dynamic tracking provided the STK fileset contains the necessary
'sn_location' and 'node_name' attributes.

Note: It is strongly recommended that SAN changes involving tape devices
be made with no active I/O.  Due to the serial nature of tape devices,
a single I/O failure can cause an application, such as a tape backup,
to fail.

Devices that configure with the displayable messages of "Other FC SCSI Disk"
or "Other FC SCSI Tape" will not support dynamic tracking.


==========================================
Fast Fail and Dynamic Tracking Interaction
==========================================

Although Fast Fail and Dynamic Tracking of FC Devices are technically separate
features, the enabling of one may change the interpretation of the other in
certain situations.  The following table shows the behavior exhibited
by the FC drivers with the various permutations of these settings.


dyntrk  fc_err_recov    FC Driver Behavior
======  ============    ==================
no      delayed_fail    This is the default setting.  This is legacy behavior
                        existing in previous versions of AIX.  The FC drivers
                        will not recover if the scsi_id of a device changes,
                        and I/Os will take longer to fail when a link loss
                        occurs between a remote storage port and switch.
                        This may be desirable in single-path situations
                        if dynamic tracking support is not a requirement.

no      fast_fail       If the driver receives a Registered State Change
                        Notification (RSCN) from the switch, this could
                        indicate a link loss between a remote storage port
                        and switch.  After an initial 15 second delay, the
                        FC drivers will query to see if the device is on
                        the fabric.  If not, I/Os will be flushed back by
                        the adapter.  Future retries or new I/Os will fail
                        immediately if the device is still not on the fabric.

                        If the FC drivers detects the device is on the fabric
                        but the scsi_id has changed, the FC device drivers
                        will not recover, i.e., I/Os will be failed with
                        PERM errors.

yes     delayed_fail    If the driver receives a Registered State Change
                        Notification (RSCN) from the switch, this could
                        indicate a link loss between a remote storage port
                        and switch.  After an initial 15 second delay, the
                        FC drivers will query to see if the device is on
                        the fabric.  If not, I/Os will be flushed back by
                        the adapter.  Future retries or new I/Os will fail
                        immediately if the device is still not on the fabric,
                        although the storage driver (disk, tape, FAStT)
                        drivers may inject a small delay (2-5 seconds) between
                        I/O retries.

                        If the FC drivers detects the device is on the fabric
                        but the scsi_id has changed, the FC device drivers
                        will reroute traffic to the new scsi_id.

yes     fast_fail       If the driver receives a Registered State Change
                        Notification (RSCN) from the switch, this could
                        indicate a link loss between a remote storage port
                        and switch.  After an initial 15 second delay, the
                        FC drivers will query to see if the device is on
                        the fabric.  If not, I/Os will be flushed back by
                        the adapter.  Future retries or new I/Os will fail
                        immediately if the device is still not on the fabric.
                        The storage driver (disk, tape, FAStT) will likely
                        not delay between retries.

                        If the FC drivers detects the device is on the fabric
                        but the scsi_id has changed, the FC device drivers
                        will reroute traffic to the new scsi_id.


Note that with dynamic tracking disabled, there is a marked difference
between the 'delayed_fail' and 'fast_fail' settings of the 'fc_err_recov'
attribute.  However, with dynamic tracking enabled, the setting of the
'fc_err_recov' attribute is less significant.  This is because there
is some overlap in the dynamic tracking and fast fail error recovery
policies.  As such, enabling dynamic tracking inherently enables some
of the fast fail logic.

The general error recovery procedure when a device is no longer reachable
on the fabric is the same for both 'fc_err_recov' settings with dynamic
tracking enabled, with the minor difference that the storage drivers may
choose to inject delays between I/O retries if 'fc_err_recov' is
set to 'delayed_fail'.  This will increase the I/O failure time by some
additional amount depending on the delay value and number of retries before
permanently failing the I/O.  With high I/O traffic, however, the difference
between 'delayed_fail' and 'fast_fail' may be more noticeable.

SAN administrators may want to experiment with these settings to find
the correct combination of settings for their environment.


=====================
Known Hardware Issues
=====================

--------
SWITCHES
--------
In general, if problems are encountered due to cable swaps or other operations
involving a switch, the switch firmware should always be updated to the
latest supported levels regardless of the type of switch being used.

IBM 2109 FC Switch
------------------
Some problems were experienced when using these features and performing
multiple cable swaps with the IBM FC 2109 F16 switch at GA firmware
level 3.0.  These problems are solved by updating the switch firmware to
the latest levels.  These features were tested using v3.0.2k firmware
for this switch.

Inrange Switch
--------------
If using an Inrange FC/9000 FC Director, the Inrange switch must
be at a minimum level of "FC 4.1.2.7" firmware in order to support
Dynamic Tracking and Fast Fail.



Document information

More support for: AIX family

Software version: 520

Operating system(s): AIX

Reference #: 520readmefb4520desr_
lpp_bos

Modified date: 01 May 2005


Translate this page: