Choosing the recovery procedure for a disk failure or disk errors

Use this information to determine the actions that you need to take when you recover your system because of a failed or damaged disk unit.

Note: If you receive a system reference code (SRC) code that indicates a disk problem, do not perform an initial program load (IPL) before your service representative arrives. If you perform an IPL, your service representative might not be able to recover data from the damaged disk unit.

The steps you follow to recover from a disk failure depend on the following items:

  • What unit failed.
  • Whether disk protection, such as device parity protection or mirrored protection is active.
  • Whether you have user auxiliary storage pools (ASPs) configured.
  • Whether some or all of the sectors on the disk are damaged. If a disk unit must be replaced, a service representative normally tries to copy the information from the disk unit when it is replaced. This procedure is sometimes referred to as a disk pump.

Use Table 1 to determine what recovery procedure you should follow, based on the failure that has occurred on your system. To find your situation on the chart, ask your service representative whether data was copied successfully (the results of the disk pump).

Service representative terminology Terminology in recovery charts
Full pump None of the data is lost
Partial pump Some of the data is lost
Could not pump All of the data is lost

Recovery for disk errors that do not require disk replacement: Some types of disk units automatically recover from errors without needing to be replaced. In some cases, however, sectors are damaged before the disk unit reassigns them and some object damage occurs. If you receive a message indicating that object damage has occurred and disk sectors have been reassigned, consider this to be the value Some for the column Data loss on the failed unit in Table 1.

If you are recovering from disk errors but you did not need a service representative to replace the disk unit, you might need to perform tasks that are normally performed by a service representative. Make a copy of the appropriate checklist and mark it as follows:
  1. Begin at the task immediately following Attach the new disk unit.
  2. If the checklist contains a task called, Restore the disk unit data, skip that task.
Table 1. Choosing the correct recovery procedure for disk media failure
Type of unit that failed Data loss on failed unit Availability protection on failed unit User APSs configured? Procedure to follow
Any N/A Mirrored protection N/A 1 Checklist 14: Actions for non-load source disk unit failure
Any N/A Device parity protection N/A 1 Checklist 15: Actions for non-load source disk unit failure
Load source unit None None N/A 1 Checklist 1: Actions for load-source disk unit failure
Load source unit Some2 None N/A 1 Checklist 2: Actions for load-source disk unit failure
Load source unit All None No Checklist 3: Actions for load-source disk unit failure
Load source unit. No basic ASPs in overflowed status3 All None Yes Checklist 4: Actions for load-source disk unit failure
Load source unit. One or more basic ASPs in overflowed status3. All None Yes Checklist 5: Actions for load-source disk unit failure
Non-Load source unit in system ASP4 None None N/A 1 Checklist 6: Actions for non-load source disk unit failure or disk units in basic user auxiliary storage pool disk failure
Non-Load source unit in system ASP4 Some2 None N/A 1 Checklist 7: Actions for non-load source disk unit failure
Non-Load source unit in system ASP4 All None No Checklist 8: Actions for non-load source disk unit failure
Non-Load source unitin system ASP4. No basic ASPs in overflowed status3. All None Yes Checklist 9: Actions for non-load source disk unit failure
Non-Load source unit in system ASP4. One or more basic ASPs in overflowed status3. All None Yes Checklist 10: Actions for non-load source disk unit failure
Disk unit in basic ASP None None Yes Checklist 6: Actions for non-load source disk unit failure or disk units in basic user auxiliary storage pool disk failure
Disk unit in basic ASP Some2 None Yes Checklist 11: Actions for a failure in a basic auxiliary storage pool disk unit
Disk unit in basic ASP. Failed unit not in overflowed status3. All None Yes Checklist 12: Actions for a failure in a basic auxiliary storage pool disk unit
Disk unit in basic ASP. Failed unit in overflowed status3. All None Yes Checklist 13: Actions for a failure in a basic auxiliary storage pool disk unit
Disk unit in independent ASP None None Yes Checklist 17: Actions for independent auxiliary storage pool disk failure
Disk unit in independent ASP Some2 None Yes Checklist 18: Actions for a failure in an independent auxiliary storage pool disk unit
Disk unit in independent ASP All None Yes Checklist 19: Actions for a failure in an independent auxiliary storage pool disk unit
Cache storage in input/output processor (IOP) Some N/A N/A1 Checklist 23: Actions for a failed cache card
1
The recovery procedure is the same whether user ASPs are configured.
2
If the service representative was partially successful in saving data from a failed disk unit, you should consider treating the situation as a complete data loss on the failed unit.
3
Step 4 in Resetting an overflowed user ASP without an IPL topic describes how to determine whether a user ASP is in overflowed status.
4
If a unit in your system ASP fails and a replacement is not immediately available, you can use the procedure in Checklist 16: Actions for non-load source disk unit failure topic. This procedure allows you to return your system to operation. You will have less disk storage and you will need to recover all the data in the system ASP.