How should the ICMP monitor and elements be configured and tuned for high volume monitoring?
The main factors on the ICMP monitor performance is the number of pings that are required to be sent to complete the current set of configuration and the time intervals allowed to complete them within.
The number of pings and timing intervals are controlled with the following element parameters:
- numberofpings: This configures how many pings are going to be sent for each element test. The default is 5.
- retries: This is the number of times a failed ping will be retried, this should be used to filter out dropped packets. The default is 0.
- failureretests: This is the number of times an element is retested if it fails, this is used to filter out transient failures. The default is 0.
- retestinterval: How long to wait before testing a failed element again in seconds. Only used if retests are configured. The default is 0
- packetinterval: This is delay between pings within an element in seconds. The default is 1.
- poll: The time in seconds between testing the element. The default is 300
Based on these configuration setting the total number of required pings can be calculated. Calculations assume that all elements are configured with the same options.
Best case scenario (all tests pass):
Total Pings = Number of elements * number of pings
Worse case scenario (all tests fail):
Total Pings = Number of elements * (number of pings * (retries + 1)) * (failure retests + 1)
Using the default configuration values in order to test 1000 hosts it would require the following number of pings:
Total Pings = 1000 * 5 = 5000 pings
If the elements were configured to be retested twice on failure the maximum number of pings would then be:
Total Pings = (1000 * 5 * (0 + 1)) * (2 + 1) = 15000
Adding a single retry for a failed pings onto this would require:
Total Pings = 1000 * (5 * (1 + 1)) * (2 + 1) = 30000
As the number of hosts required to be polled increases there are two main ways to tune the configuration, by increasing the poll interval to provide a larger time window to send the required pings and/or reducing the number of pings being sent per test. In order to reduce the number of pings required to complete an element the use of retries and retests should be kept to a minimum. Additionally consider reducing the number of pings per element from the default of 5.
The maximum throughput of the monitor can be configured using the following monitor properties:
- PingsPerSec: This is the maximum number of pings that the monitor will send every second if required. The default is 100.
- IntraPingWait: This is the time interval in milliseconds that the monitor will wait between sending pings. The default is 0.
Using the calculated total pings required value you can estimate a PingsPerSec setting by dividing it by the Total Pings by the poll interval.
PingsPerSec = 30000 / 300 = 100
This is only a rough estimate as it isn't taking into account the additional wait times for retests and intervals between pings in a test. Additionally this calculation is for the worst case scenario where every element will send the maximum number of pings possible. The total number of pings value can be adjusted to take into account the expected failure rate in your environment.
For example estimating a 10% failure rate:
Total Pings = 0.9 * Best Case Pings + 0.1 * Worst Case Pings = 0.9 * 5000 + 0.1 * 30000 = 7500
PingsPerSec = 7500 / 300 = 25
Caution should be taken with setting the PingsPerSec to larger values (1000 or more) as it can cause the ICMP monitor to flood the network with ICMP traffic. In some cases network infrastructure can detect this as a ping flood which can interrupt with the operation of the monitor if packets are dropped. In this case the IntraPingWait option might need to be enabled which will cause the monitor to space out the sending of ping values. Using this option will limit the throughput of the monitor and should only be used if absolutely necessary. It should be set using the following formula:
IntraPingWait < 1000 / PingsPerSec
A common cause of ping failures on multihomed machines is for the ICMP monitor to bind to the incorrect IP Address. This binding behavior can be controlled with the following properties:
- IpAddress: This determines the Address that IPv4 ICMP requests will be sent from. The default is "".
- Ipv6Address: This determines the Address that IPv6 ICMP requests will be sent from. The default is "nobind".