IBM Support

Additional procedures to setup GDPC on AIX 10GE RoCE network using DB2 V10.5 FP7 or higher

Preventive Service Planning


Abstract

This document lists the additional software requirements and setup procedures to setup GDPC on AIX 10GE RoCE starting from DB2 V10.5 FP7 onwards.

Content


The following is only applicable to setup of GDPC on AIX 10GE RoCE starting from DB2 V10.5 FP7 onwards.


I. DB2 V10.5 FP7 new software requirement

The minimum AIX level is AIX 6.1 TL9 SP5.


II. DB2 V10.5 FP7 additional setup in DB2 instance

Before you begin: Follow the steps outlined in the following link to set up the 10GE RoCE network in a GDPC environment on AIX: Setting up a RoCE network in a GDPC environment (AIX).


A. Setup new condition response on public Ethernet for every host in the cluster


In non-GPDC pureScale, GPFS uses the public Ethernet IP subnet (usually associated with the hostname) for heatbeating among all hosts. A failure in this network will be detected by GPFS directly through the loss of heartbeat leading to the shutdown of GPFS on the impacted hosts. With GPDC on AIX 10GE RoCE (2 or 4 switches), the GPFS heartbeat IP subnet is changed to the second private Ethernet work (see section 6.2 for detail). In order to preserve the existing automatic shutdown of GPFS when the public Ethernet network is down, a new condition response must be setup as instructed below :

1. Run the following on a host other than the tiebreaker host to create the new condition response pair on the public Ethernet:

    IP_ADDRESS="<IP address for the node>"
    COND_NAME=condrespV105_<node name>_condition_en0
    RESP_NAME="condrespV105 _<node name>_response"

    /bin/mkcondition -r IBM.NetworkInterface -d 'Adapter is not online' -e 'OpState != 1' -D 'Adapter is online' -E 'OpState = 1' -m l -S c -s "IPAddress == '${IP_ADDRESS}'" ${COND_NAME}
    /bin/chcondition -L ${COND_NAME }
    /bin/mkcondresp ${COND_NAME} ${RESP_NAME}


2. Run the following command to activate and lock the new condition response:

    /bin/startcondresp ${COND_NAME} ${RESP_NAME}
    /bin/rmcondresp -L ${COND_NAME} ${RESP_NAME}


3. Validate the network resiliency:

    /home/<DB2 instance ID>/sqllib/bin/db2cluster -cfs -list -network_resiliency -resources

The output should be similar to:

    ====> Conditions <====
    Displaying condition information:
    condition 1:
            Name                        = "condrespV105_node1_condition_en0"
            Node                        = "node1.torolab.ibm.com"
            MonitorStatus               = "Monitored"
            ResourceClass               = "IBM.NetworkInterface"
            EventExpression             = "OpState != 1"
            EventDescription            = "Adapter is not online"
            RearmExpression             = "OpState = 1"
            RearmDescription            = "Adapter is online"
            SelectionString             = "IPAddress == '<IP ADDRESS>'"
            Severity                    = "c"
            NodeNames                   = {}
            MgtScope                    = "l"
            Toggle                      = "Yes"
            EventBatchingInterval       = 0
            EventBatchingMaxEvents      = 0
            BatchedEventRetentionPeriod = 0
            BattchedEventMaxTotalSize   = 0
            RecordAuditLog              = "ALL"
     
    condition 2:
            Name                        = "condrespV105_node1_condition_en2"
            Node                        = "node2.torolab.ibm.com"
            MonitorStatus               = "Monitored"
            ResourceClass               = "IBM.NetworkInterface"
            EventExpression             = "OpState != 1"
            EventDescription            = "Adapter is not online"
            RearmExpression             = "OpState = 1"
            RearmDescription            = "Adapter is online"
            SelectionString             = "IPAddress == '<IP ADDRESS>'"
            Severity                    = "c"
            NodeNames                   = {}
            MgtScope                    = "l"
            Toggle                      = "Yes"
            EventBatchingInterval       = 0
            EventBatchingMaxEvents      = 0
            BatchedEventRetentionPeriod = 0
            BattchedEventMaxTotalSize   = 0
            RecordAuditLog              = "ALL"

    .
    .
    .

    ===> Responses <====
    Displaying response information:
            ResponseName    = "condrespV105_node1_response"
            Node            = "node1.torolab.ibm.com"
            Action          = "condrespV105_node1_response event handler"
            DaysOfWeek      = 1-7
            TimeOfDay       = 0000-2400
            ActionScript    = "/usr/sbin/rsct/sapolicies/db2/condrespV105_resp.ksh"
            ReturnCode      = 0
            CheckReturnCode = "y"
            EventType       = "A"
            StandardOut     = "n"
            EnvironmentVars = ""
            UndefRes        = "y"
            EventBatching   = "n"

    .
    .
    .

    ====> Associations <====
    Displaying condition with response information:
    condition-response link 1:
            Condition = "condrespV105_node1_condition_en0"
            Response  = "condrespV105_node1_response"
            Node      = "node1.torolab.ibm.com"
            State     = "Active"
    condition-response link 2:
            Condition = "condrespV105_node1_condition_en2"
            Response  = "condrespV105_node1_response"
            Node      = "node1.torolab.ibm.com"
            State     = "Active"


4. Repeat step #1 to #3 on other hosts except the tiebreaker host.


B. GPFS configuration parameters

1. Step 4w in the current V10.5 knowledge center GDPC configuration : http://www-01.ibm.com/support/knowledgecenter/api/content/nl/en-us/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/t0060670.html is no longer required. The GPFS failureDetectionTime and leaseRecoveryWait are setup automatically by db2cluster when the equivalent parameters in cluster manager (Step 3) are set. In the event where step 4w was performed, run the following commands to revert back to the appropriate values :

    /usr/lpp/mmfs/bin/mmchconfig failureDetectionTime=48
    /usr/lpp/mmfs/bin/mmchconfig leaseRecoveryWait=35


2. Configure the two additional parameters :

    /usr/lpp/mmfs/bin/mmchconfig minMissedPingTimeout=70
    /usr/lpp/mmfs/bin/mmchconfig maxMissedPingTimeout=80

Document information

More support for: DB2 for Linux, UNIX and Windows
Install/Migrate/Upgrade - Fixpak

Software version: 10.5

Operating system(s): AIX

Reference #: 1974233

Modified date: 06 January 2016


Translate this page: