Use this MAP to resolve the following problem: Multipath
redundancy level got worse (SRN nnnn - 4060)
The possible
causes are:
- A failed connection caused by a failing component in the SAS fabric
between, and including, the adapter and device enclosure.
- A failed connection caused by a failing component within the device
enclosure, including the device itself.
Note: The failed connection was previously working, and might
have already recovered.
Considerations:
- Remove power from the system before connecting and disconnecting
cables or devices, as appropriate, to prevent hardware damage or erroneous
diagnostic results.
- Some systems have SAS and PCI-X or PCIe bus interface logic integrated
onto the system boards and use a pluggable RAID enablement card (a
non-PCI form factor card) for such integrated-logic buses. See the
feature comparison tables for PCIe and PCI-X cards.
For these configurations, replacement of the RAID enablement card
is unlikely to solve a SAS-related problem because the SAS interface
logic is on the system board.
- Some systems have the disk enclosure or removable media enclosure
integrated in the system with no cables. For these configurations
the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated
device enclosure.
- Some configurations involve a SAS adapter connecting to internal
SAS disk enclosures within a system using a FC3650 or FC3651 cable
card. Keep in mind that when the MAP refers to a device enclosure,
it could be referring to the internal SAS disk slots or media slots.
Also, when the MAP refers to a cable, it could include a FC3650 or
FC3651 cable card.
- When using SAS adapters in either an HA Two System RAID or HA
Single System RAID configuration, ensure that the actions taken in
this MAP are against the Primary adapter and not the Secondary adapter.
- Before executing the system verification action in this map, reconstruct
any degraded disk arrays if possible. This will help avoid potential
data loss resulting from the adapter reset performed during system
verification action taken in this map.
Attention: When SAS fabric problems exist, obtain
assistance from your hardware service provider:
- Before you replace a RAID adapter. Because the adapter might contain
nonvolatile write cache data and configuration data for the attached
disk arrays, additional problems can be created by replacing an adapter.
- Before you remove functioning disks in a disk array. A disk array
might become degraded or failed and additional problems might be created
if functioning disks are removed from a disk array.
Step 3153-1
Determine if the problem still
exists for the adapter that logged this error by examining the SAS
connections as follows:
- Start the IBM® SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select RAID Array Manager.
- Select IBM SAS Disk Array Manager.
- Select Diagnostics and Recovery Options.
- Select Show SAS Controller Physical Resources.
- Select Show Fabric Path Graphical View.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3153-2.
- Yes
- Go to Step 3153-6.
Step 3153-2
Run diagnostics
in system verification mode on the adapter to rediscover the devices
and connections.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select Run Diagnostics.
- Select the adapter resource.
- Select System Verification.
Note: Disregard any trouble found for now, and continue with
the next step.
Step 3153-3
Determine if the problem still
exists for the adapter which logged this error by examining the SAS
connections as follows:
- Start the IBM SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select RAID Array Manager.
- Select IBM SAS Disk Array Manager.
- Select Diagnostics and Recovery Options.
- Select Show SAS Controller Physical Resources.
- Select Show Fabric Path Graphical View.
- Select a device with a path that is not Operational (if
one exists) to obtain additional details about the full path from
the adapter port to the device. Refer to Viewing SAS fabric path information for
an example of how this additional detail can be used to help isolate
where in the path the problem exists.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3153-4.
- Yes
- Go to Step 3153-6.
Step 3153-4
Since the problem
persists, some corrective action is needed to resolve the problem.
Proceed by doing the following:
- Power off the system or logical partition.
- Perform only one of the corrective actions listed below, which
are listed in the order of preference. If one of the corrective actions
has previously been attempted, then proceed to the next one in the
list.
Note: Prior to replacing parts, consider using a complete
powerdown of the entire system, including any external device enclosures,
to provide a reset of all possible failing components. This might
correct the problem without replacing parts.
- Power on the system or logical partition.
Note: In some situations,
it might be acceptable to unconfigure and reconfigure the adapter
instead of powering off and powering on the system or logical partition.
Step 3153-5
Determine if the problem still
exists for the adapter that logged this error by examining the SAS
connections as follows:
- Start the IBM SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection screen.
- Select RAID Array Manager.
- Select IBM SAS Disk Array Manager.
- Select Diagnostics and Recovery Options.
- Select Show SAS Controller Physical Resources.
- Select Show Fabric Path Graphical View.
- Select a device with a path which is not Operational (if
one exists) to obtain additional details about the full path from
the adapter port to the device. Refer to Viewing SAS fabric path information for
an example of how this additional detail can be used to help isolate
where in the path the problem exists.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3153-4.
- Yes
- Go to Step 3153-6.
Step 3153-6
When the problem is resolved, see the removal and replacement
procedures topic for the system unit on which you are working and
do the "Verifying the repair" procedure.