Best practice for number of TEM Clients that a TEM Relay should have reporting to it.
Resolving the problem
The standard best practice recommendation is that each Relay can support 500-1000 Clients. This recommendation maximizes the speed and responsiveness of TEM Clients while ensuring that any individual Relay (which are typically on shared computers with other duties) will not experience too much load.
There is no published maximum number of Clients per Relay and many customers will use more than 1000 Clients per Relay. Tests do confirm that the Relays do not fail under very high Client load. However, there are several downsides as the number of Clients per Relay increase:
- Client responsiveness will decrease because the Relay will not be able to service all Clients simultaneously (a typical Relay can handle 1000 simultaneous connections). This will result in slower action response times and file distribution times.
- Relay infrastructure will become less fault-resistant due to the fact that if a Relay fails (for any reason), the remaining Relays will have many more Clients failing over.
- Relay CPU and overall resource usage will increase as a Relay has to handle more Clients.
- If Relays are handling too many Clients, it often means that many Clients are connecting over (potentially slow) WAN connections, which is sub-optimal for network bandwidth.
- If a Relay is handling many Clients, it may no longer be a good candidate for a shared server.
Here is a hypothetical scenario to illustrate two deployments with 10,000 Clients (for the purposes of this example, we will assume high-speed LAN connections to all Clients):
- Deployment A has 10 Relays with 1,000 Clients per Relay as recommended. If an action is pushed to all Clients with a 1 MB payload, each Relay can simultaneously serve the file to all Clients at the same time. Since there are fast network speeds, the package can be pushed to all 10,000 computers who can report their information in just a few minutes and then the Relays quickly return to a low-resource usage idle-state.
- Deployment B has 1 Relay that serves 10,000 computers. If the same 1 MB action is published to all computers, the Relay would only be able to service 1,000 Clients at a time and 9,000 Clients would fail to connect on the first try. After the first 1,000 Clients were given the download package, then 9,000 Clients would try to connect again, but only 1,000 Clients would be serviced and the 8,000 Clients would retry (and so on...) The Client retry behavior would control how long the Client would wait for each successive failure to connect to the Relay.
So, rather than delivering the package in parallel and getting rapid responses from all computers in a few minutes like in deployment A, deployment B would likely take much longer (probably several hours due to the Client retry behavior that doubles after each failure). The Relay would be under high load for a long time. From the Console operators view, it would appear that the Clients were not responsive and seemed to report their results sporadically.
Note that Clients will write connection and timeout errors to the Client logs when they are unable to connect to a Relay and this is much more likely when a Relay services more than 1000 Clients. These intermittent errors can be safely ignored.
Important: If you are considering using a BigFix Relay to handle significant amounts of Clients above the recommended values, please contact BigFix Professional Services for assistance configuring your deployment and Relays to better handle the high load.