Configuring high availability

The high availability feature involves the use of a second Dispatcher machine. The first Dispatcher machine performs load balancing for all the client traffic as it does in a single Dispatcher configuration. The second Dispatcher machine monitors the "health" of the first, and takes over the task of load balancing if it detects that the first Dispatcher machine failed.

About this task

When you configure high availability, each of the two machines is assigned a specific role, either primary or backup. The primary machine sends connection data to the backup machine on an ongoing basis. While the primary is active (load balancing), the backup is in a standby state, continually updated and ready to take over, if necessary.

The communication sessions between the two machines are referred to as heartbeats. The heartbeats allow each machine to monitor the health of the other. If the backup machine detects that the active machine failed, it takes over and begins load balancing. At that point the statuses of the two machines are reversed. The backup machine becomes active and the primary becomes standby.
Remember: In the high availability configuration, both primary and backup machines must be on the same subnet with identical configuration.
Tip: Tips for configuring high availability:
  1. To configure a single Dispatcher machine to route packets without a backup, do not issue any of the high availability commands at startup.
  2. To convert two Dispatcher machines that are configured for high availability to one machine that is running alone, stop the executor on one of the machines, then delete the high availability features (the heartbeats, reach, and backup) on the other machine.
  3. [Linux]Linux® for z/OS® operating systems: In both of the previous two cases, you must alias the network interface card with cluster addresses, as required.
  4. When you run two Dispatcher machines in a high availability configuration, unexpected results can occur if you set any of the parameters for the executor, cluster, port, or server (for example, port stickytime) to different values on the two machines.
  5. In most cases, you must position the high availability definitions at the end of the file. The cluster, port, and server statements must be placed before the high availability statements. This positioning is because when high availability synchronizes, it looks for the cluster, port, and server definitions when a connection record is received.

Procedure

  1. You can create script files, which are invoked on high availability status changes, to report state changes and address management. For more information about the available scripts, see Scripts to run with high availability
  2. [Linux] If you are running Linux for the z/OS operating systems in layer 3 mode, create alias script files on each of the two Dispatcher machines.
    The scripts that you create should contain commands to complete the following tasks:
    • Configure the cluster IP address on the interface
    • Add an iptables rule to drop incoming packets that are destined to the cluster address
    For more information, see Configuring the Dispatcher machine.
  3. Ensure that your primary load balancer is configured as wanted for load balancing in a stand-alone load balancer configuration before you add high availability to the setup. For more information about guidance with initial settings, see Configuring the Dispatcher machine.
  4. Add the heartbeat information on the load balancer:
    dscontrol highavailability heartbeat add source_address destination_address

    Source_address is the local load balancer and destination_address is the partner load balancer. The values are either DNS names or IP addresses. If the load balancers have multiple interfaces, the address should be the non-forwarding address.

    Advanced users: Multiple heartbeats are not typically necessary. If the network is heavily congested, adding multiple heartbeats may reduce false takeovers but multiple heartbeats can have the opposite effect if the networks are not heavily used. It is recommended to use a single heartbeat.

    Advanced users: You can adjust the number of seconds that the load balancer will wait for a heartbeat response before the backup load balancer becomes the active load balancer. The default is 2 seconds. This value might need to be increased if failovers are encountered and the primary load balancer does not appear to have experienced a failure. For example:
    dscontrol executor set hatimeout 3
  5. Configure the list of IP addresses that the Dispatcher must be able to reach to ensure full service, by using the reach add command.

    The default gateway is the recommended reach target but is not required. For more information, see Detecting server failures with heartbeats and reach targets. For example:

    dscontrol highavailability reach add 9.67.125.18 
  6. Add the backup information:
    dscontrol highavailability backup add primary [auto | manual]
          port

    The load balancers communicate by using the UDP protocol so ensure that an unused port is provided. Port values greater than 2000 are recommended.

  7. Proceed with the configuration as follows:
    1. Configure your backup load balancer with the same stand-alone configuration that is used for the primary load balancer in step (3).
    2. Add the heartbeat that is defined in step (4) to the backup load balancer but reverse the order of all addresses.
    3. Add the same reach targets (if any) that were added to the primary load balancer is step (5).
    4. Finally, add the backup information added in step (6), but use the keyword backup instead of the primary.
      dscontrol highavailability backup add backup [auto | manual]
          port
  8. Check the high availability status on each machine:
    dscontrol highavailability status

    Ensure that the machines each have the correct role (backup or primary) and states. Ensure that the primary is in active mode and that the backup is in standby mode. The recovery strategies must be the same.

  9. Optional: Enable replication of connection and affinity records.

    With this feature, connection and affinity records can be replicated between high availability partners. When the records are replicated, connection and affinity states are preserved so that the connections can continue even after takeover has taken place.

    • Enable replication for connection and affinity records:
      dscontrol port set cluster@port repstrategy both
    • Enable replication for only connection records:
      dscontrol port set cluster@port repstrategy connection
    • Enable replication for only affinity records:
      dscontrol port set cluster@port repstrategy affinity
    • Disable replication:
      dscontrol port set cluster@port repstrategy none

    For more information about this command, see the topic on the dscontrol port command.

    Important:

    Replication increases traffic between the load balancers and can reduce the capacity to distribute traffic. Connection record replication can be especially costly. Connections that do not need to be maintained across should not be defined to be replicated. Avoid connection records for short lived connections like HTTP traffic, but consider using connection replication for long-lived traffic such as LDAP.

  10. After configuration is complete and verified, save the configuration by using the load balancer file -> save command. The load balancer is designed to save the configuration in the proper order so that it is restored and operates as expected after the product is restarted. If you edit the configuration manually, the high availability statements should be at the end of the configuration file unless you have any collocated servers defined by using MAC forwarding.
  11. Optional: [Linux] Suppress the unreachable packets for the ICMP port that are generated by the operating system in response to heartbeat packets.
    # iptables -t filter -A INPUT -p udp --destination-port <port> -j DROP
    <port> is the port number for the heartbeat. For more information on this issue, read the technote ICMP Port unreachable sent when HA packet received.
  12. Start the manager and advisors on both machines.