IBM Support

Tips on configuring the high availability feature for Load Balancer

Troubleshooting


Problem

With the high availability function for Load Balancer, packet distribution takeovers occur if the primary partner fails or is shut down. When the backup partner assumes packet forwarding function, the cluster addresses (and return addresses if used) are moved to the new active Load Balancer. To maintain existing connections after a high availability takeover, connection and affinity replication can be configured. There are numerous timing and configuration considerations that can affect this takeover operation.

Symptom

The tips listed in this technote help alleviate problems that arise from high availability configuration problems such as:
  • Connections dropped after takeover
  • Partner machines are unable to synchronize
  • Requests erroneously directed to the backup partner machine

Resolving The Problem

The following tips are helpful for successful configuration of high availability on your Load Balancer machines.
  1. The positioning of the high availability commands in your script files can make a significant difference.  The configuration is saved in the recommended fashion when you save the configuration by using the load balancer "save" function.  It is best practice to allow the load balancer to create the configuration file after updates to the configuration.

    Examples of high availability commands are:
       dscontrol highavailability heartbeat add ...
       dscontrol highavailability backup add ...
       dscontrol highavailability reach add ...


    In most cases, you must position the high availability definitions at the end of the file. The cluster, port, and server statements must be placed before the high availability statements.

    There are several issues that can occur if the high availability statements are not placed at the end of the configuration. If high availability synchronization occurs, the load balancer looks for the cluster, port, and server to process the replication record. If the cluster, port, and server do not exist, the connection record is dropped. If a takeover occurs and the connection record was not replicated to the partner machine, the connection fails. If a takeover occurs before the clusters, ports, and servers are defined, a gratuitous arp might not be sent for all necessary addresses. Forwarding failures will occur because the routers will direct traffic to the partner load balancer, which would be in standby mode after the takeover.
  2. The order differs when collocated servers and MAC-forwarding method is configured. On AIX, the high availability statements must come before the collocated server statements. If the high availability statements are not before the collocated server statements, Load Balancer receives a request for the collocated server and attempts to load balance to a new server. A cycle is created attempting to forward the same packet on the network. On Linux, a tunnel is created when a collocated server is defined. The tunnel must be aliased with the cluster address in the goActive script. For this reason, high available must be configured after all collocated servers are defined.
     
  3. Linux for System z operating systems, the hypervisor controls the interface and multiplexes the real interface among the guest operating systems. The hypervisor permits only one guest at a time to register itself for an address, and there is an update window. When the cluster address is removed from the backup machine, a delay can be necessary before the cluster address is configured on the primary machine; otherwise, traffic continues to be sent to the wrong Load Balancer.

    To correct this behavior, add a sleep delay in the goActive script. The amount of time needed to sleep is deployment-dependent. It is recommended that you start with a sleep delay time of 10.
     
  4. High availability partners must be able to communicate with each other and must be on the same subnet.

    By default, the machines attempt to communicate with each other every one-half second and will detect a failure after two seconds with no communication received. If the Load Balancer is busy forwarding traffic, failovers occur if the Load Balancer cannot answer the heartbeat within the timeout period. You can increase the number of times until failure by issuing:

    dscontrol executor set hatimeout new_timeout_value 

    The executor must be started for this command to be successful.
     
  5. If port connection replication is defined, all the connection records for the port are transferred after a takeover. To reduce the processing, old connections must not remain in memory for an extended amount of time. In particular, LDAP ports typically have large staletimeout periods (in excess of one day). Setting a large staletimeout period causes old connections to remain in memory, which causes more connection records to be passed at synchronization, and also more memory usage on both machines. If replication is required with a large staletimeout value, consider affinity forwarding and replicating the affinity records, not the connection records. There are fewer affinity records and processing is reduced.
     
  6. When a partner machine takes over the workload, it issues a gratuitous ARP. The gratuitous ARP inform any machine on the same subnet of the new hardware address associated with the cluster address. You must ensure that your routers honor gratuitous ARPs and update their cache, or the requests are sent to the inactive partner.
  7.  Select manual recovery mode to minimize takeover operations.  With automatic recovery mode, the primary Load Balancer is preferred for forwarding activity. Takeovers occur to force active load balancing back to the primary Load Balancer.
  8. For the high availability port number, select a port value higher than 10000 to prevent conflict with ports reserved for other applications.

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000CdJZAA0","label":"IBM Edge Load Balancer-\u003EHA (High Availability)"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"8.5.0;8.5.5;9.0.0;9.0.5"},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"Component":"Java\u2122 SDK","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
18 May 2022

UID

swg21211427