The mountV97_monitor.ksh returns false 'path does not exist' errors in an IBM Smart Analytics System

Technote (troubleshooting)


Problem(Abstract)

Some customers have seen the default file system mounts monitor script report false negatives, leading to unnecessary failovers.

Symptom

mountV97_monitor.ksh returns false 'path does not exist' errors.


Cause

Unconfirmed, but suspected to be a network issue, most likely load related.

Environment

IBM Smart Analytics System using the TSA monitor scripts from the HA Toolkit here:

Diagnosing the problem

If your ISAS environment has had nodes fail over for 'path not found' errors, yet there are no discernible disk or mount problems that would explain the errors, then a revised version of the mountV97_monitor.ksh script may help.


The default mountV97_monitor.ksh script attempts to touch a file called .mp_monitor.{Process ID} at the mount point. If the file is created successfully then it is removed and a return code of 1 is returned. If the file cannot be created then the test fails, then the mountV97_stop.ksh is called and a return code of 3 is sent.

The revised mountV97_monitor.ksh script tries the touch command up to six times, at five second intervals, and logs any non-zero return codes from the touch command. This prevents intermittent failures (due to spikes in network load, for example) from triggering a node failover. It also permits better diagnosis of these intermittent problems without the disruption of a failover.

The downside of using this script is that in the event of a genuine 'path not found' situation, failover is delayed by 30 seconds. That is why these changes have not been made to the default version of the mountV97_monitor.ksh script in the HA Toolkit. The intent of the revised version is that it replaces the standard version until the root cause of the false failures is identified and fixed, not that it stays there permanently.

Resolving the problem

There is a revised version of the mountV97_mount.ksh script which retries the touch command up to six times at five second intervals (as compared to the default of one attempt), logging any failing return codes, and only fails over to a standby node after the sixth touch failure.

Contact IBM Support to obtain a revised script to address the issue.


Cross reference information
Segment Product Component Platform Version Edition
Information Management InfoSphere Balanced Warehouse Balanced Warehouse AIX, Linux 9.7 Enterprise

Rate this page:

(0 users)Average rating

Document information


More support for:

IBM Smart Analytics System

Software version:

9.7, 10.0

Operating system(s):

AIX 6.1, Linux

Software edition:

All Editions, Edition Independent

Reference #:

1640096

Modified date:

2013-07-23

Translate my page

Machine Translation

Content navigation