IBM Support

Data corruption issue VMware esxi 5.0 on IBM System Storage DCS3700, DS3512, DS3524 with VAAI enabled

Troubleshooting


Problem

vStorage APIs for Array Integration (VAAI) is a feature first introduced ESX/ESXi 4.1 that provides hardware acceleration functionality. It enables your esxi host to offload specific virtual machine and storage management operations to compliant storage hardware. With the storage hardware assistance, your esxi host performs these operations faster and consumes less CPU, memory, and storage fabric bandwidth. Data corruption is reported when using VMware ESXi 5.0 with VAAI hardware accelerated VMFS data movement enabled on IBM System Storage DS3500 and DCS3700 Storage Controllers. The issue occurs when XCOPY is utilized to improve performance on the following VMware operations:   - Storage vMotion - Virtual Machine cloning - Virtual Machine snapshots - Deploy from a Virtual Machine template

Symptom


The virtual machine will deploy from a template without error and power on but then crash with a BSOD or other times with Windows 2008 recovery screen. The following symptoms were reported when deploying a Windows 2008 virtual machine from a template with VAAI enabled :


1. A Virtual Machine stopped during boot with the message :

A disk read error occurred
press Ctrl_Alt_Sel to restart

2. Virtual Machine crashed with the BSOD with the message PAGE_FAULT_IN_NONPAGED_AREA

3. Virtual Machine failed to boot with the message :

"The file or directory D:\Windows\System32 is corrupt and unreadable. Please run the Chkdsk utility."

Cause

The issue occurs when XCOPY is utilized to improve performance.

Environment

VMware esxi 5.0 hosts with IBM System Storage DCS3700, DS3512, DS3524 with firmware version 07.83.18.00 - 07.83.25.00

Resolving The Problem


Solution

This behavior has been corrected in IBM DS3500/DCS3700 Controller firmware version 7.83.27.00, This firmware version was released November 20th.

From the change history :

November 19, 2012 - Version 07.83.27.00

- Fix the following defects Severity 1
- LSIP200338869 (CL LSIP200335780) (TD_IOP7034) LBA shifting data corruption
error using XCOPY via VAAI
- LSIP200340469 (CL LSIP200336003) (XB115456) Controller hit a driver bug
condition that leads to data corruption

The file is available by selecting the Product Group "System Storage" , type of System Storage "Disk Systems" , Disk system "Entry-Level disk systems", Entry-level disk systems "DS3500(DS3512,DS3524) from IBM Support's Fix Central web page, at the following URL:

http://www.ibm.com/support/fixcentral/


Workaround

Users potentially exposed to this issue should disable the portion of VAAI on the vSphere server that utilizes XCOPY on the storage controller, using the following ESXi command:

esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedMove

This disables only hardware acceleration of these VMware operations; the basic operations will still function properly. Once the hardware acceleration is disabled, the possibility of data corruption from this issue is eliminated.

Please see IBM Retain Tip H206954 for details :

http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004749

[{"Product":{"code":"SSCLB3","label":"VMware Solutions"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Component":"ESX","Platform":[{"code":"","label":"VMWare"}],"Version":"5.0","Edition":"Enterprise","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"}}]

Document Information

Modified date:
28 January 2020

UID

isg3T1019045