IBM Support

Best practices for configuring a reliable and scalable peer group for quota enforcement

Technote (FAQ)


Question

What are the best practices for configuring a reliable and scalable peer group for quota enforcement?

Cause

Improper changes to the peer group configuration of quota enforcement could cause unexpected results in the peer group

Answer

To avoid unexpected results when operational failures occur or configuration changes improperly in the peer group of quota enforcement, administrators can follow these best practices:

Peer group requirements for quota enforcement configuration:

  • A peer group must contain at least three peers for quota enforcement failover when the master becomes unavailable. Failover automatically occurs when both of the following conditions are met:.
    1. The master failure is detected when two slaves agree that the master is not reachable after a timeout of 10 seconds.
    2. More than half of all peers in the peer group must be reachable.
    When failover is performed after 10 seconds, a new master is selected and the other slaves are informed about the new master address when they connect to peers. After failover, all data changes are written to the new master and the new master synchronizes the data across the peer group. When the original master resumes to be operational, it works as a slave in the peer group.
  • Make sure that all GatewayScript files used by all peers in the peer group are the same. Equivalent configuration ensures that the threshold for the specific traffic type is the same across the peer group.
  • Based on your requirements for quota enforcement, decide whether to enable or disable strict mode. In a peer group, when the master becomes unavailable, before failover occurs, slaves lose connection to the master. In this situation, the slave behaves differently based on the strict mode.
    • Enabled strict mode: The slave with enabled strict mode cannot process the request.
    • Disabled strict mode: If service performance and availability are more important than data-consistency, you can disable strict mode for the slave so that this slave can process the request locally. The slave with disabled strict mode writes the threshold and associated metadata to the local data storage. In this situation, the I/O transaction can be impacted. After failover occurs, the connection is resumed between slaves and the new master. The threshold and associated metadata stored by the slave can be overwritten by the new master when the new master synchronizes the data to all slaves. Data-consistency can be affected across the peer group.
  • Decide whether to use memory or RAID volume for data storage. The threshold and the associated metadata, and the counter and the associated metadata can be persisted on the RAID volume or stored in memory. When quota enforcement works in peer group mode, all peers must use the same data storage location. This means that all peers must store data in RAID volume or memory. Combination of RAID volume and memory in the peer group is not allowed.
    If data storage of all peers is in-memory, the following behaviors occur:
    • After you configure the peer group for quota enforcement, when you want to reconfigure or manually reboot a peer, the following rules must be met.
      • When you reconfigure or reboot the master, make sure that a slave is first switched to the master. Then, you can reconfigure or reboot the original master. In this case, the originally stored data is remained in the memory of the new master.
      • When you reconfigure or reboot a slave, the slave synchronizes data with the master. In this case, the originally stored data is still remained in the master memory.
    • When the master becomes unavailable, before failover occurs and during the master timeout (10 seconds), after the master is automatically restarted, the database in the master becomes empty. The slaves synchronize data with the resumed master. In this case, the originally stored data is lost.

When configuring a peer group for quota enforcement, you can follow these rules:
  1. Make sure that the administrative state is enabled. Otherwise, enable the administrative state.
  2. When creating a peer group, add peer members in the Peers list one by one by starting peers in sequence. The peer connects to other peers in the order that are specified in the Peers list. Remove peers one by one by disabling the quota enforcement server when you attempt to stop part of or the whole peer group. You cannot start or stop all peers at the same time.
  3. You can specify whether data storage is persisted on the RAID volume or is in-memory.
    • For persistent storage, select the RAID volume that must be raid0 RAID volume.
    • For in-memory storage, do not select the RAID volume. By default, the data storage is in-memory.
  4. The priority only affects the result of the failover and it does not affect the role of a peer that joins a peer group. When the peer group is down, you must restart all peers in the peer group one by one. You cannot restart all peers at the same time. Failover occurs when more than half of all peers return to work and these peers must be reachable. In this situation, the peer that first resumes active works as the new master.
  5. All peers in the peer group must use the same SSL configuration. This configuration means that all settings that are configured in the following items must be the same: state of SSL enablement, key alias, and certificate alias.
    • When you want to change the state of SSL enablement from enabled SSL to disabled SSL, follow these steps:
      1. Disable the quota enforcement server on all peers.
      2. Make sure that you disable SSL on all peers.
      3. Enable the quota enforcement server on the master.
      4. Enable the quota enforcement server on slaves one by one.
      5. Check the quota enforcement server status provider on all peers to make sure that all peers are operational and in peer group mode.
    • When you want to change the state of SSL enablement from disabled SSL to enabled SSL, follow these steps:
      1. Disable the quota enforcement server on all peers.
      2. Make sure that you enable SSL on all peers; and make that that all peers use the same key alias and certificate alias.
      3. Enable the quota enforcement server on the master.
      4. Enable the quota enforcement server on slaves one by one.
      5. Check the quota enforcement server status provider on all peers to make sure that all peers are operational and in peer group mode.
6. All peers must use the same strict mode.
For details, see the Quota enforcement topic in Knowledge Center.

To avoid unexpected results in the following situations:
  • If you disable the network interface (Ethernet or VLAN) on the peer that has quota enforcement enabled service, the in-flight transaction can be blocked. When the network interface is disabled on the master, it can take long time for slaves to elect the new master. Therefore, to avoid these unexpected results, switch the peer from master to slave, perform quiesce action on the service, and then disable the network interface for any maintenance routines.
  • To safely remove a peer from the peer group, make sure that the role of the target peer is slave. If the role of the target peer is master, you must first change the role of a suitable slave peer to master by manually executing the quota-enforcement-switch-master command. Then, you can remove the target peer (the original master) by changing the operational state of its quota enforcement server to down. Intentionally or accidentally removing a peer when the operational state of its quota enforcement server is up can affect the result of the failover procedure because the removed peer is still considered to be a member in the peer group.
  • If a peer starts or restarts without valid peers to connect to, it fails to join or re-join the existing peer group and becomes the master of its own. To avoid such failures, make sure that you add as many valid peers as possible in the peer list. This increases the chance of successful connection to existing peers when a peer attempts to join the peer group.
  • In strict mode, to protect against an out-of-memory instance, carefully configure the IBM DataPower Gateway throttle-threshold. The throttle-threshold should be more conservative than the default value, 20%, which means that more buffer and lower risk are considered. For details about the throttle-threshold, see the Configuring throttle settings topic in Knowledge Center.

Tips
The following two types of timeout occur in different conditions.
  • 10-second timeout
    Failover occurs when both of the following conditions are met:
    1. The master failure is detected when two slaves agree that the master is not reachable after a timeout of 10 seconds.
    2. More than half of all peers in the peer group must be reachable.
  • 30-second timeout
    When there is no any TCP level acknowledgement between the master and slaves, for example because of network outage, the transactions are terminated in 30-second I/O timeout. 
Normally, the connection lost between the master and slaves can be detected and the failover is triggered (after 10 seconds). When the connection is lost, currently the quota enforcement server does not try establishing connection again for the same transaction.
  • If packet is dropped between slaves and the master, the incoming traffic can stay at the GatewayScript action for a while until I/O timeout (30 seconds). Then, ratelimit module API call returns with errors.
  • If packet is rejected between slaves and the master, ratelimit module API call returns with errors immediately.
In either case, you can check the response arguments from ratelimit module API to see whether the ratelimit policy was enforced correctly; and decide what to do next by your GatewayScript file.

Document information

More support for: IBM DataPower Gateways
General

Software version: 7.5

Operating system(s): Firmware

Software edition: Edition Independent

Reference #: 1981525

Modified date: 01 June 2016