Potential Problem to 10.2.2 & 10.2.2.a XIV Storage System that can be caused by changing system time via Network Time Protocol (NTP)

Technote (troubleshooting)


Problem(Abstract)

A thread within the cache node process, which is running on all of the modules, may miss several heartbeats due to a clock change backward in time. This in turn might cause one cache node in the \ system to fail and cause the system to go into a rebuild.

The clock change backward could have been made by an NTP server or by a xcli time_set command

Symptom

A failure of a cache node process will cause the 12 disks in the module to become unavailable, triggering a rebuild.

Since the heartbeat monitoring code will fail only one cache node process, there is no risk of double module failure or data loss

Cause

Time shift backward at the system level can cause the proc_sync_remote thread within the cache node to sleep for a longer period of time than expected and to miss a heartbeat, making the cache node think that it has failed

  • This event will occur for all of the cache nodes but the manager will only allow one cache node to fail in order to avoid a data loss state
  • This only happens when the thread is dormant, if a copy service sync job is running this will not happen, because that the thread is alive and does not sleep
  • Changes to the timezone of the machine such as moving between daylight saving and standard times do not affect the underlying system time and will not cause this issue.
When changing from daylight savings time to standard time, change the timezone using the xcli timezone_set command. Do not change the system time backwards using the xcli time_set command.

Environment

10.2.2 & 10.2.2.a

Resolving the problem

Fix is planned for 10.2.4 version

Rate this page:

(0 users)Average rating

Document information


More support for:

2810 - XIV Storage System

Version:

Version Independent

Operating system(s):

Platform Independent

Software edition:

N/A

Reference #:

S1003712

Modified date:

2011-10-31

Translate my page

Machine Translation

Content navigation