Troubleshooting
Problem
When HADR is used with Tivoli System Automation for Multiplatforms (TSAMP) 3.2.1.1 and a network cable is unplugged the HADR takeover will not finish and fail
Environment
HADR environment with TSAMP lower than 3.2.1.3
Diagnosing The Problem
At first two additional relationships were added to
Add a new relationship for your HADR database by running the following command as root :
rgreq -o lock HADR-rg
mkrel -p dependson -S IBM.Application:HADR-rs -G IBM.Equivalency:db2_public_network_0 \ HADR-rs_DependsOn_db2_public_network_0-rel
rgreq -o unlock HADR-rg
But this not completely solve problem.
The trace shows that the HADR resource is not stopped as it should. But as soon as DB2 reaches HADR disconnected status a LOCK for the resource group is set and prevent TSAMP from further actions. Here the IBM.ServiceIP was not stopped and therefore the resource group could not be started on the other side
09/02/14 10:12:08.948728 T(4141890448) _RCD ReportState: Resource : eth0/Fixed/IBM.NetworkInterface/db2b reported state change: 2
09/02/14 10:12:08.967404 T(4141890448) _RCD Resource::doRIBMAction Offline Request against db2-rs on node db2b.
09/02/14 10:12:08.967438 T(4141890448) _RCD Resource::doRIBMAction Offline Request against HADR-rs on node db2b.
09/02/14 10:12:08.988111 T(4141890448) _RCD ReportState: Resource : HADR-rs/Fixed/IBM.Application/db2b reported state change: 6
09/02/14 10:12:08.994833 T(4141890448) _RCD ReportState: Resource : db2-rs/Fixed/IBM.Application/db2b reported state change: 6
09/02/14 10:12:13.338019 T(4141890448) _RCD ReportState: Resource : HADR-rs/Fixed/IBM.Application/db2b reported state change: 2
09/02/14 10:12:16.763561 T(4141890448) _RCD LockResource request injected: HADR-rg/ResGroup/IBM.ResourceGroup
09/02/14 10:12:20.856603 T(4141890448) _RCD ReportState: Resource : db2-rs/Fixed/IBM.Application/db2b reported state change: 2
09/02/14 10:24:04.687419 T(4141890448) _RCD ReportState: Resource : eth0/Fixed/IBM.NetworkInterface/db2b reported state change: 1
It does not show stop of IBM.ServiceIP such as
09/04/14 09:23:39.927985 T(4134083472) _RCD ReportState: Resource : db2ip_172_xx_xx_xx-rs/Fixed/IBM.ServiceIP/db2b reported state change: 2
Resolving The Problem
It was found that this is the same problem as described in APAR IV03673.
This APAR is fixed in TSAMP version 3.2.1.3 and newer.
The APAR description is: RESOURCES RESET AFTER SUCCESSFUL START IN CASE ANOTHER RESOURCE STARTED AT THE SAME TIME HAS A LONG RUNNING STARTCOMMAND
Although the APAR does not indicate it was confirmed that the root cause is the same.
Problem can be resolved by installing FixPack 3.2.1.3 or newer such as 3.2.2.8 (most recent 3.2.2.x FixPack at time of writing)
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
swg21684693