IBM Support

Why do failover of shared file systems for Puredata for Operational Analytics(PDOA) Solution are slow causing the automatic failovers to fail for core warehouse hosts?

Question & Answer


Question

Why do failover of shared file systems for Puredata for Operational Analytics(PDOA) Solution are slow causing the automatic failovers to fail for core warehouse hosts?

Cause

As a result of slow detection & failover mechanism of the shared file systems like /db2home which is mounted on all hosts by General Parallel File System(GPFS) layer, the automatic failover eventually timeout and fail since this is a prerequisite for a smooth failover on all hosts.

Answer

When checked at deeper level in GPFS layer for most cases it is observed that there are some agent threads which get overloaded with detection and activity they need carry out for quick failover of /db2home file system.

So in order to accelerate this process the count of such threads needs to be increased.

The following command will do that and will be effective for the whole domain if run from the GPFS manager node which is admin node

./gpfs.snap.Host02_07211954.out.tar.gz_unpack/Host02_07211954/mmfs.l


ogs.Host02:Fri Jun 17 15:34:59.164 2016: GPFS: 6027-630 Node
172.23.1.4 (Host02) appointed as manager for db2home.
./gpfs.snap.Host02_07211954.out.tar.gz_unpack/Host02_07211954/mmfs.l
ogs.Host02:Fri Jun 17 15:35:03.760 2016: GPFS: 6027-611 Recovery:
db2home, delay 16 sec. for safe recovery.
./gpfs.snap.Host02_07211954.out.tar.gz_unpack/Host02_07211954/mmfs.l
ogs.Host02:Fri Jun 17 15:40:08.750 2016: GPFS: 6027-643 Node
172.23.1.4 (Host02) completed take over for db2home..

You can see from log snippet above that for Host02 it took 6 minutes to failover /db2home and as a result the failover will timeout and thus fail.

The solution to fix this is to run command


mmchconfig tscWorkerPool=128

and take a GPFS software recycle to make this change effective from admin host.

Related Information

[{"Product":{"code":"SSH2TE","label":"PureData System for Operational Analytics A1801"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":["Not Applicable","Not Applicable"],"Platform":[{"code":"PF002","label":"AIX"}],"Version":"1.0","Edition":"All Editions","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Product Synonym

PDOA

Document Information

Modified date:
17 October 2019

UID

swg21991409