Use this MAP to resolve the following problems:
- Device bus fabric error (SRN nnnn – 4100)
- Temporary device bus fabric error (SRN nnnn – 4101)
The possible causes are:
- A failed connection caused by a failing component in the SAS fabric
between, and including, the adapter and device enclosure.
- A failed connection caused by a failing component within the device
enclosure, including the device itself.
Considerations:
- Remove power from the system before connecting and disconnecting
cables or devices, as appropriate, to prevent hardware damage or erroneous
diagnostic results.
- Some systems have SAS and PCI-X or PCIe bus interface logic integrated
onto the system boards and use a pluggable RAID enablement card (a
non-PCI form factor card) for such integrated-logic buses. See the
feature comparison tables for PCIe and PCI-X cards.
For these configurations, replacement of the RAID enablement card
is unlikely to solve a SAS-related problem because the SAS interface
logic is on the system board.
- Some systems have the disk enclosure or removable media enclosure
integrated in the system with no cables. For these configurations
the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated
device enclosure.
- Some configurations involve a SAS adapter connecting to internal
SAS disk enclosures within a system using a FC3650 or FC3651 cable
card. Keep in mind that when the MAP refers to an device enclosure,
it could be referring to the internal SAS disk slots or media slots.
Also, when the MAP refers to a cable, it could include a FC3650 or
FC3651 cable card.
- When using SAS adapters in either an HA Two System RAID or HA
Single System RAID configuration, ensure that the actions taken in
this MAP are against the Primary adapter (i.e. not the Secondary adapter).
- Before executing the system verification action in this map, reconstruct
any degraded disk arrays if possible. This will help avoid potential
data loss resulting from the adapter reset performed during system
verification action taken in this map.
Attention: When SAS fabric problems exist, obtain
assistance from your hardware service provider:
- Before you replace a RAID adapter. Because the adapter might contain
nonvolatile write cache data and configuration data for the attached
disk arrays, additional problems can be created by replacing an adapter.
- Before you remove functioning disks in a disk array. A disk array
might become degraded or failed and additional problems might be created
if functioning disks are removed from a disk array.
Step 3152-1
Determine if the problem still
exists for the adapter which logged this error by examining the SAS
connections as follows:
- Start the IBM® SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select RAID Array Manager.
- Select IBM SAS Disk Array Manager.
- Select Diagnostics and Recovery Options.
- Select Show SAS Controller Physical Resources.
- Select Show Fabric Path Graphical View.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3152-2.
- Yes
- Go to Step 3152-6.
Step 3152-2
Run diagnostics
in system verification mode on the adapter to rediscover the devices
and connections.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select Run Diagnostics.
- Select the adapter resource.
- Select System Verification.
Note: Disregard any trouble found for now, and continue with
the next step.
Step 3152-3
Determine if the problem still
exists for the adapter which logged this error by examining the SAS
connections as follows:
- Start the IBM SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select RAID Array Manager.
- Select IBM SAS Disk Array Manager.
- Select Diagnostics and Recovery Options.
- Select Show SAS Controller Physical Resources.
- Select Show Fabric Path Graphical View.
- Select a device with a path which is not Operational (if
one exists) to obtain additional details about the full path from
the adapter port to the device. Refer to Viewing SAS fabric path information for
an example of how this additional detail can be used to help isolate
where in the path the problem exists.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3152-4.
- Yes
- Go to Step 3152-6.
Step 3152-4
Because the
problem persists, some corrective action is needed to resolve the
problem. Proceed by doing the following:
- Power off the system or logical partition.
- Perform only one of the corrective actions listed below, which
are listed in the order of preference. If one of the corrective actions
has previously been attempted, then proceed to the next one in the
list.
Note: Prior to replacing parts, consider doing a power off
of the entire system, including any external device enclosures, to
provide a reset of all possible failing components. This might correct
the problem without replacing parts.
- Power on the system or logical partition.
Note: In some situations,
it might be acceptable to unconfigure and reconfigure the adapter
instead of powering off and powering on the system or logical partition.
Step 3152-5
Determine if the problem still
exists for the adapter that logged this error by examining the SAS
connections as follows:
- Start the IBM SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select RAID Array Manager.
- Select IBM SAS Disk Array Manager.
- Select Diagnostics and Recovery Options.
- Select Show SAS Controller Physical Resources.
- Select Show Fabric Path Graphical View.
- Select a device with a path which is not Operational (if
one exists) to obtain additional details about the full path from
the adapter port to the device. Refer to Viewing SAS fabric path information for
an example of how this additional detail can be used to help isolate
where in the path the problem exists.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3152-4.
- Yes
- Step 3152-6.
Step 3152-6
When the problem is resolved, see the removal and replacement
procedures topic for the system unit on which you are working and
do the "Verifying the repair" procedure.