No failover in HA/HADR v10.1.0.1/2 cluster due to missing relationships

Technote (troubleshooting)


Problem(Abstract)

A network outage affected the public NIC on the primary HADR server but there was no automated failover. The only change in the 'lssam' output was the NIC (on the primary server) within the db2_public_network_0 equivalency changing to Offline.

Symptom

"lssam" output:

Online IBM.ResourceGroup:db2_db2adm1_db2adm1_MYDB-rg Nominal=Online
   |- Online IBM.Application:db2_db2adm1_db2adm1_MYDB-rs
        |- Online IBM.Application:db2_db2adm1_db2adm1_MYDB-rs:node01
        '- Offline IBM.Application:db2_db2adm1_db2adm1_MYDB-rs:node02
Online IBM.ResourceGroup:db2_db2adm1_node01_0-rg Nominal=Online
   '- Online IBM.Application:db2_db2adm1_node01_0-rs
        '- Online IBM.Application:db2_db2adm1_node01_0-rs:node01
Online IBM.ResourceGroup:db2_db2adm1_node02_0-rg Nominal=Online
   '- Online IBM.Application:db2_db2adm1_node02_0-rs
         '- Online IBM.Application:db2_db2adm1_node02_0-rs:node02
Online IBM.Equivalency:db2_db2adm1_db2adm1_MYDB-rg_group-equ
   |- Online IBM.PeerNode:node01:node01
   '- Online IBM.PeerNode:node02:node02
Online IBM.Equivalency:db2_db2adm1_node01_0-rg_group-equ
   '- Online IBM.PeerNode:node01:node01
Online IBM.Equivalency:db2_db2adm1_node02_0-rg_group-equ
   '- Online IBM.PeerNode:node02:node02
Online IBM.Equivalency:db2_public_network_0
   |- Offline IBM.NetworkInterface:en0:node01
   '- Online IBM.NetworkInterface:en0:node02

Notice that the "en0" NIC on "node01" is shown as Offline, but the DB2 instance and HADR database resource are still Online on node01.

Cause

In DB2 v10.1.0.0, v10.1.0.1, and v10.1.0.2, db2haicu is not creating the below shown relationships, whether run with an XML file or in interactive mode. This is a known bug (APARs IC91667 and IC91816) with the DB2 "db2haicu" utility that should be resolved as of v10.1.0.3.

Diagnosing the problem

Normally the DB2 instance on the affected node would be forced offline if there was a problem with the Public Network. This would trigger automation including an eventual failover. In a HADR environment, the HADR database would be forced offline and this would trigger an eventual failover (takeover) of the HADR database. However, these resources are only forced offline if there is a "DependsOn" relationship between itself and the "db2_public_network_0" equivalency. Here's the 'lsrel -Ab' output showing the expected relationships for a typical HADR environment:


Managed Relationship 1:
  Class:Resource:Node[Source] = IBM.Application:db2_db2inst1_node01_0-rs
  Class:Resource:Node[Target] = {IBM.Equivalency:db2_public_network_0}
  Relationship                = DependsOn
  Conditional                 = NoCondition
  Name                        = db2_db2inst1_node01_0-rs_DependsOn_db2_public_network_0-rel
  ActivePeerDomain            = hadrdom
  ConfigValidity              =

Managed Relationship 2:
  Class:Resource:Node[Source] = IBM.Application:db2_db2inst1_node02_0-rs
  Class:Resource:Node[Target] = {IBM.Equivalency:db2_public_network_0}
  Relationship                = DependsOn
  Conditional                 = NoCondition
  Name                        = db2_db2inst1_node02_0-rs_DependsOn_db2_public_network_0-rel
  ActivePeerDomain            = hadrdom
  ConfigValidity              

Managed Relationship 3:
  Class:Resource:Node[Source] = IBM.Application:db2_db2inst1_db2inst1_MYDB-rs
  Class:Resource:Node[Target] = {IBM.Equivalency:db2_public_network_0}
  Relationship                = DependsOn
  Conditional                 = NoCondition
  Name                        = db2_db2inst1_db2inst1_MYDB-rs_DependsOn_db2_public_network_0-rel
  ActivePeerDomain            = hadrdom
  ConfigValidity              =


Here's the 'lsrel -Ab' output showing the expected relationship for a typical HA shared disk environment:

Managed Relationship 1:
  Class:Resource:Node[Source] = IBM.Application:db2_db2inst1_node01_0-rs
  Class:Resource:Node[Target] = {IBM.Equivalency:db2_public_network_0}
  Relationship                = DependsOn
  Conditional                 = NoCondition
  Name                        = db2_db2inst1_node01_0-rs_DependsOn_db2_public_network_0-rel
  ActivePeerDomain            = hadrdom
  ConfigValidity              =


Resolving the problem

The missing relationships needs to be created manually. To aid in this effort, there is an attached tarball that contains a script called "add_rels.sh" and an associated config file called "ha_env.cfg".

add_rels_v2.taradd_rels_v2.tar

Untar into a directory of its own. Edit the "ha_env.cfg" file with the details of your HADR or HA Shared Disk environment. This includes setting the HATYPE parameter to either "HADR" for a HADR environment or "HA" for a HA Shared Disk environment.

Then with the domain online and both nodes online, as root, from either node, run the "add_rels.sh" as follows:
./add_rels.sh -c ha_env.cfg

Note any errors.
Use 'lsrel -Ab' to confirm the creation of the relationships similar to those listed above.


Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Tivoli System Automation for Multiplatforms

Software version:

3.2.2

Operating system(s):

AIX, Linux

Reference #:

1634431

Modified date:

2013-08-01

Translate my page

Machine Translation

Content navigation