Skip to main content

ASYNC_IO_OPERATION_FAILED when ProtecTIER Backend Disk Storage access Exceeds 5 mins


Flash (Alert)


Abstract

IBM has found under certain circumstances where Qlogic backend ports are disconnected from the disk storage for more the 5 minutes, an ASYNC_IO_OPERATION_FAILED message may appear in logs indicating data inaccessibility in the repository.

Content

Cause
Applicable to ProtecTIER systems running below code V3.1.10.0.

After multiple Qlogic abort commands and VTL I/O timeout failures, ProtecTIER's asynchronous I/O method will over write 1MB buffers in the RAM memory causing loss of access to data.

Symptom / problem diagnose / key to search
vtf internal logs can show message's as:
Aug 6 22:15:03 ibm0vtl01 vtl[4137]: (9664)[ERROR]: GEN: AioVec::ProcessAioStatus: query aio status failed ctxN 0 nMaxAioCtx 64 ioEvent.res 802816 p=0x2aacaf7c14f8 offset=23653473280 buf=0x2aacbe8cd200 size=65536 -
ASYNC_IO_OPERATION_FAILED
Aug 6 22:15:18 ibm0vtl01 vtl[4137]: (7360)[ERROR]: CartRep: CartridgeReplication::ReadDataCheckCRC: cart 0006_01_0000000983 CheckEntry failed - CHECKSUM_MISMATCH

Qlogic abort and failures connecting to the storage luns can be found in the message logs during the same time as the errors mentioned in the vtf internal logs:
Aug 6 04:47:08 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: scsi(5:0:1): Abort command issued -- 1 a47dffa 2002.
Aug 6 04:48:08 ibm0vtl01 kernel: rport-5:0-0: blocked FC remote port time out: saving binding
Aug 6 04:48:08 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: scsi(5:0:0): DEVICE RESET ISSUED.
Aug 6 04:48:40 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: qla2xxx_eh_bus_reset: reset succeeded
Aug 6 04:48:50 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: Performing ISP error recovery - ha= ffff81047dd184f8.
Aug 6 04:48:50 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: LIP reset occured (f700).
Aug 6 04:48:50 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: LOOP UP detected (8 Gbps).
Aug 6 04:49:11 ibm0vtl01 kernel: qla2xxx 0000:18:00.0: qla2xxx_eh_host_reset: reset succeeded
Aug 6 04:49:21 ibm0vtl01 kernel: sd 5:0:0:1: scsi: Device offlined - not ready after error recovery
Aug 6 04:49:21 ibm0vtl01 last message repeated 3 times
Aug 6 04:49:21 ibm0vtl01 kernel: sd 5:0:0:1: rejecting I/O to offline device
Aug 6 04:49:21 ibm0vtl01 kernel: sd 5:0:0:1: rejecting I/O to offline device
Aug 6 04:49:21 ibm0vtl01 last message repeated 3 times

If this failure is encountered:
Vtfd must be stopped immediately until backend issues are resolved .
After vtfd is restarted and buffers are reset all older data can be read correctly without errors. Even if Backend issues are already resolved, vtfd must be restarted on each node that has the string - ASYNC_IO_OPERATION_FAILED .

Create a list of ALL cartridges that were used for WRITING from the time this misbehavior started , up until vtfd was stopped ( on both nodes ) and have the backup application back the data up again.

Resolving the problem / Resolution
The asynchronous I/O mechanism has fix number 81017784 included in code V3.1.10.0 and above. Please upgrade your ProtecTIER systems as soon as possible.


Cross reference information
Segment Product Component Platform Version Edition
Tape Storage TS7650 with ProtecTIER Open Systems 2.4, 2.5, 3.1, 3.1.8 Advanced, Enterprise, N/A

Rate this page:

(0 users)Average rating

Copyright and trademark information

IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Rate this page:


(0 users)Average rating

Add comments

Document information

TS7650G with ProtecTIER


Version:
2.2, 2.3, 2.3.3.0, 2.4, 2.5, 3.1, 3.1.8


Operating system(s):
Open Systems


Software edition:
Enterprise, N/A


Reference #:
S1004188


Modified date:
2012-08-22

Translate my page

Content navigation