IBM Support

DB2ClusterPing tool: uDAPL ping connectivity test option for RDMA transport

Question & Answer


Question

How to test uDAPL connection ping on RoCE or Infiniband Networks In a DB2 pureScale Cluster for an RDMA transport

Cause

Validate the uDAPL network connectivity before deploying a Purescale cluster, or after deployment when a DB2 member
fails to connect to the CF when using the RDMA transport in the CF_TRANSPORT_METHOD

Answer

Overview
The goal of the db2ClusterPing standalone utility when used with the -rdma_ping option is to validate the uDAPL connectivity between interface adapters on hosts in a pureScale cluster. 

Note: With the release of Db2 V11.1, db2ClusterPing is no longer supported. It is recommended to use the db2cluster -verify command to test the uDAPL connections across all hosts in the cluster.

Prerequisites
  • DB2 installation is not required.
  • db2ClusterPing standalone executable must be copied to the same path on each host in the cluster.
  • Passwordless SSH must be configured for the cluster.
  • RoCE and Infiniband networks are supported on AIX and Linux platforms.


Usage

This tool can be used to perform RDMA ping across all the members in a cluster in two different ways:

    • using an input file provided by the user that contains a list of hostname/netname/interface adapters to be tested, or
    • automatic retrieval of hostname/netname/interface adapter information defined in the pureScale instance configuration
These two usage scenarios have been explained below.



I. Usage 1 - Providing an input file
 
  • A user can specify through an input file any number of hostname/netname/interface adapters to be tested without requiring the pureScale instance to be deployed.


    Format of input file:

    • The input file has the following format, any line preceded by # is skipped like a comment.

      • #Hostname     Netname           Interface-Adapter
        hostname1     hostname1-ib0   my_device-0
        hostname2     hostname2-ib1   my_device-1


    Selecting the Interface adapter from /etc/dat.conf:
     
    • The hostname is used by the tool to get remote access to each of the hosts to validate the netname and interface-adapter entries. The interface adapter is the first token in the dat.conf file that corresponds to the netname defined in the /etc/hosts file on each host.

      Examples of lines in dat.conf files that match hostname1-ib0 are:
       
      • Example line in /etc/dat.conf that would match for IB:
        my_device-0 u2.0 nonthreadsafe default /usr/lib/libdapl/libdapl2.a(shr_64.o) IBM.1.1 "/dev/iba0 1 ib0" " "

        Example line in /etc/dat.conf that would match for RoCE with IP support:
        my_device-0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "hostname1-ib0 0" ""

        Example line in /etc/dat.conf that would match for RoCE without IP support:
        my_device-0 u2.0 nonthreadsafe default /usr/lib/libdapl/libdapl2.a(shr_64.o) IBM.1.1 "/dev/roce0 1 10.1.1.18" " "


    Command description:
     
    • To use the input file the following command is used:

      • db2ClusterPing -rdma_ping -exec_path <full path to db2ClusterPing> -adapter_list <path to input file>  [ OPTIONS ... ]

      Here are the descriptions of the arguments:
       
      • -exec_path: Indicates the path including the name of the db2ClusterPing standalone executable in all hosts

        -adapter_list: Perform RDMA ping using the device information contained in the input file. A user can specify through an input file any number of hostname/netname/interface adapters to be tested. The tool will validate that ping using RDMA works between all of the included adapters.

        Optional arguments: Refer to the section below for a list of optional arguments.


    Example:
     
    • Using the above input file contents saved in /home/user/inputFile.txt and 'db2ClusterPing' is located at /home/user on every host

      • db2ClusterPing -rdma_ping -exec_path /home/user/db2ClusterPing -adapter_list /home/user/inputFile.txt



      •  
      Here is the output from the program:

      • Starting cluster check tool
        Detailed Log can be found in /tmp/db2ClusterPing-150924_133733.log
        Detailed error tracing can be found in /tmp/db2ClusterPing-150924_133733.trace
        Performing udapl ping test from hostname1-ib0 to hostname1-ib0. Status: PASS
        Performing udapl ping test from hostname2-ib1 to hostname1-ib0. Status: PASS
        Performing udapl ping test from hostname1-ib0 to hostname2-ib1. Status: PASS
        Performing udapl ping test from hostname2-ib1 to hostname2-ib1. Status: PASS

      A status of PASS means the ping from the first netname (as a uDAPL client) to the second netname (as a uDAPL server) was successful, a FAIL means it was not successful. The log file /tmp/db2ClusterPing-150924_133733.log would show any uDAPL error codes in case the test fails.



II. Usage 2 - Using the pureScale instance configuration
 
  • If db2 is installed, the tool can automatically retrieve the netnames on all the hosts that are defined in the pureScale instance configuration and use these to perform the RDMA ping test for all possible unique combinations


    Command description:
     
    • The following command would be used:

      • db2ClusterPing -rdma_ping -exec_path <full path to db2ClusterPing> -instance_shared_dir <path to sqllib_shared>  [ OPTIONS ... ]


      Here are the descriptions of the arguments:
       
      • -exec_path <path to db2ClusterPing>: Indicates the path including the name of the db2ClusterPing standalone executable in all hosts

        -instance_shared_dir <path to sqllib_shared>: The DB2 pureScale instance shared directory that holds instance shared files and default database path, accessible by all hosts within the same instance. If a DB2 instance is created, this tool can automatically retrieve the netnames for all the hosts that are defined in the pureScale instance configuration and use these to perform the RDMA ping test for all possible unique combinations. This option is only supported on DB2 version 10.5 or higher.

        Optional arguments: Refer to the section below for a list of optional arguments.


    Example:
     
    • Below is a cluster setup as seen by "db2instance -list"
       
      • $db2instance -list
      • 0 MEMBER STOPPED hostname1    hostname1 NO 0 0 hostname1-ib0
        1 MEMBER STOPPED hostname2    hostname2 NO 0 0 hostname2-ib0
        128 CF STOPPED hostname1    hostname1 NO - 0 hostname1-ib0
        129 CF STOPPED hostname2    hostname2 NO - 0 hostname2-ib0

      The following command would be used to run the RDMA ping (db2ClusterPing is located in /home/user folder):

      • db2ClusterPing -rdma_ping -exec_path /home/folder/db2ClusterPing -instance_shared_dir /path/to/sqllib_shared


      •  

      db2ClusterPing will extract the following hostname/netname pairs based on the instance configuration, and the interface adapter is automatically determined from the dat.conf file.
       
      • hostname1    /    hostname1-ib0
        hostname2    /    hostname2-ib0

      Here is the sample output from running the program:

      • Starting cluster check tool
        Detailed Log can be found in /tmp/db2ClusterPing-150924_132400.log
        Detailed error tracing can be found in /tmp/db2ClusterPing-150924_132400.trace
        Performing udapl ping test from hostname1-ib0 to hostname1-ib0. Status: PASS
        Performing udapl ping test from hostname2-ib0 to hostname1-ib0. Status: PASS
        Performing udapl ping test from hostname1-ib0 to hostname2-ib0. Status: PASS
        Performing udapl ping test from hostname2-ib0 to hostname2-ib0. Status: PASS


      •  

      A status of PASS means the ping from the first netname (as a uDAPL client) to the second (as a uDAPL server) was successful. FAIL means it was not successful. 
      The log file /tmp/db2ClusterPing-150924_132400.log would show any uDAPL error codes in case the test fails.


Optional Arguments

The following arguments are optional:
 
  • -num_pings <number of pings>: The number of times the client pings the server. The default value is 10 pings.

    -message_size <buffer size>: The ping message size in bytes between client and server. The default value is 100 bytes.

    -timeout <timeout in seconds>: The amount of time a server or client waits for a connection from its peer. The default value is 20 seconds.

    -port <connection port>: The port on which the server process listens for a client's connection. The default value is 57934. Choose a port which is not being used by other applications.

    -num_connections <number of connections>: The number of times the client repeats the ping test by reconnecting to the server and pinging again. The default value is 1 connection.

    -remote_shell_cmd <cmd>: Specify the full path to the desired remote shell command for client and server cluster check execution on target hosts. The default value is ssh.


  •  


Examples:
 
  • Run rdma_ping using an input file and specifying the number of pings, size of the ping message and the port.
    • ./db2ClusterPing -rdma_ping -exec_path /home/akshay/db2ClusterPing -adapter_list input_file -num_pings 5 -message_size 1000 -port 2045

  • Run rdma_ping using an input file and specifying the command to access remote shell and the number of times a client repeats a ping test with a server.

    • ./db2ClusterPing -rdma_ping -exec_path /home/user/db2ClusterPing -adapter_list my_input_file -remote_shell_cmd /usr/bin/ssh -num_connections 3


    •  

  • Run rdma_ping using the shared directory and specifying the timeout (in seconds) on the server and client.

    • ./db2ClusterPing -rdma_ping -exec_path /home/user/db2ClusterPing -instance_shared_dir /home/user/sqllib_shared -timeout 25
 
  • Run rdma_ping using the shared directory and specifying db2locssh as the remote shell command.
 
    • ./db2ClusterPing -rdma_ping -exec_path /home/user/db2ClusterPing -instance_shared_dir /home/user/sqllib_shared -remote_shell_cmd /var/db2/db2ssh/db2locssh



'db2ClusterPing' Executables for AIX and Linux

For Aix, use the following executable
db2ClusterPingdb2ClusterPing


For Linux, use the following executable
db2ClusterPingdb2ClusterPing

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"High Availability - PureScale","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"10.1;10.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
18 June 2020

UID

swg21967473