ASYNC_IO_OPERATION_FAILED when ProtecTIER Backend Disk Storage access Exceeds 5 mins

Flash (Alert)


Abstract

IBM has found under certain circumstances where Qlogic backend ports are disconnected from the disk storage for more the 5 minutes, an ASYNC_IO_OPERATION_FAILED message may appear in logs indicating data inaccessibility in the repository.

Content

Cause
Applicable to ProtecTIER systems running below code V3.1.10.0.

After multiple Qlogic abort commands and VTL I/O timeout failures, ProtecTIER's asynchronous I/O method will over write 1MB buffers in the RAM memory causing loss of access to data.

Symptom / problem diagnose / key to search
vtf internal logs can show message's as:
Aug 6 22:15:03 ibm0vtl01 vtl[4137]: (9664)[ERROR]: GEN: AioVec::ProcessAioStatus: query aio status failed ctxN 0 nMaxAioCtx 64 ioEvent.res 802816 p=0x2aacaf7c14f8 offset=23653473280 buf=0x2aacbe8cd200 size=65536 -
ASYNC_IO_OPERATION_FAILED
Aug 6 22:15:18 ibm0vtl01 vtl[4137]: (7360)[ERROR]: CartRep: CartridgeReplication::ReadDataCheckCRC: cart 0006_01_0000000983 CheckEntry failed - CHECKSUM_MISMATCH

Qlogic abort and failures connecting to the storage luns can be found in the message logs during the same time as the errors mentioned in the vtf internal logs:
Aug 6 04:47:08 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: scsi(5:0:1): Abort command issued -- 1 a47dffa 2002.
Aug 6 04:48:08 ibm0vtl01 kernel: rport-5:0-0: blocked FC remote port time out: saving binding
Aug 6 04:48:08 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: scsi(5:0:0): DEVICE RESET ISSUED.
Aug 6 04:48:40 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: qla2xxx_eh_bus_reset: reset succeeded
Aug 6 04:48:50 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: Performing ISP error recovery - ha= ffff81047dd184f8.
Aug 6 04:48:50 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: LIP reset occured (f700).
Aug 6 04:48:50 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: LOOP UP detected (8 Gbps).
Aug 6 04:49:11 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: qla2xxx_eh_host_reset: reset succeeded
Aug 6 04:49:21 ibm0vtl01 kernel: sd 5:0:0:1: scsi: Device offlined - not ready after error recovery
Aug 6 04:49:21 ibm0vtl01 last message repeated 3 times
Aug 6 04:49:21 ibm0vtl01 kernel: sd 5:0:0:1: rejecting I/O to offline device
Aug 6 04:49:21 ibm0vtl01 kernel: sd 5:0:0:1: rejecting I/O to offline device
Aug 6 04:49:21 ibm0vtl01 last message repeated 3 times

If this failure is encountered:
Vtfd must be stopped immediately until backend issues are resolved .
After vtfd is restarted and buffers are reset all older data can be read correctly without errors. Even if Backend issues are already resolved, vtfd must be restarted on each node that has the string - ASYNC_IO_OPERATION_FAILED .

Create a list of ALL cartridges that were used for WRITING from the time this misbehavior started , up until vtfd was stopped ( on both nodes ) and have the backup application back the data up again.

Resolving the problem / Resolution
The asynchronous I/O mechanism has fix number 81017784 included in code V3.1.10.0 and above. Please upgrade your ProtecTIER systems as soon as possible.


Cross reference information
Segment Product Component Platform Version Edition
Tape Storage TS7650 with ProtecTIER Open Systems 2.4, 2.5, 3.1, 3.1.8 Advanced, Enterprise, N/A

Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

TS7650G with ProtecTIER

Version:

2.2, 2.3, 2.3.3.0, 2.4, 2.5, 3.1, 3.1.8

Operating system(s):

Open Systems

Software edition:

Enterprise, N/A

Reference #:

S1004188

Modified date:

2013-07-18

Translate my page

Machine Translation

Content navigation