QRadar Event and Flow Burst Handling (Buffer)
How does QRadar handle events or flows that temporarily exceed my license limit?
What is Event and Flow Burst Handling?
Burst handling allows QRadar appliances to deal with spikes in data that exceed the license of the appliance by moving event or flow excess to a temporary queue for processing. This feature acts as a pressure valve for event and flow data to prevent an appliance from dropping events due to temporarily exceeding the licensed event or flow rate. When a system goes over its license limit, burst handling seamlessly starts moving the excess event or flow data to the temporary queue in an attempt to prevent any dropped events. At this time, the system notification is generated to alert the QRadar administrator that an appliance has exceeded its license limit.
What is the size of the temporary queue?
By default, the event queue for QRadar 7.2.3 is 500 MB of data. QRadar 7.2.4 and above introduces flow burst handling. The QRadar 7.2.4 update also expands the size of the temporary queue for events and flows from 500 MB to 5 GB for each queue. The event or flow data in the queue is always added to the temporary queue and processed in the order that the data arrived. This can be thought of as first in, first out (FIFO) method of processing the data. The appliance continues to process data in order and any data over capacity is added to the end of the temporary queue. As the data rate declines, the system leverages difference between the license limit and the current data rate to reduce the temporary queue as fast as possible. The rate at which the temporary queue fills or empties is going to vary depending on the appliance license limit, the magnitude of the spike, the payload size, the length of time of the spike, and other factors.
Can you provide an example of Burst Handling?
For example, a corporate network has a QRadar 1828 Event/Flow Processor appliance that is rated for 5,000 events per second (EPS) and 100,000 flows per minute (FPM). Typically, this appliance sees on average 4,000 EPS for events and 70,000 flows. Every morning between 8am and 9am, the corporate network experiences an event and flow spike due to users logging in, accessing network resources, collecting email, and other normal activities. During this interval, which peaks around 9am, the appliance sees an event spike at 6,000 EPS and 100,00 FPM. The appliance realizes the excessive events, generates a notification, and the excess data is pushed to the temporary queue.
Figure 1: Example of an event spike seen during morning business hours.
Figure 2: Example of an flow spike seen during morning business hours.
How does the system recover from a spike in data?
The temporary queue for event and flow data empties in order that the data arrived. This means that older data is at the front of the queue for processing and the newest data is at the back of the queue. After the event or flow data spike is over, the system uses the difference between the license limit of the appliance and the current data rate to empty the queue. This is identified as the "Recovery" interval in figure 1 and figure 2 above. The amount of time it takes to process the data and empty the queue depends on the "Recovery" rate and the volume of data that needs to be processed. The recovery rate is defined as the gap between the appliance license limit and the incoming data rate.
For example, see Figure 1. In this scenario, the system is licensed for 5,000 EPS and experiences an event rate of 6,000 EPS, the events are queued in order of arrival while the system is over license. When the event rate returns to normal ~4,000 EPS, the system uses the difference of ~1,000 EPS to empty the 500 MB or 5 GB queue (queue size is dependent on QRadar version). The same logic applies to flows as well, ( license limit - current incoming rate = recovery rate).
License Sizing and Why it is Important
Appliances should be sized to have room above the standard EPS rate to be able to deal with periods of high event or flow traffic. The recovery rate is important because smaller the recovery rate, the longer it takes to empty the temporary queue. Offenses are not generated until the data is processed by the appliance, so the longer it takes to process the temporary queue, the longer it might take an offense to be generated.
The closer your average EPS or FPM rate is to the boundary of license limit of the appliance, the longer it can take to process the events from the temporary queue and the more time you are spending filling the queue. Systems that are closer to the boundary of their license during normal operation will take longer to return to normal operating condition. For example, a QRadar appliance with a 10,000 EPS license limit is going to take longer to empty the temporary queue when the average EPS rate is 9,500 versus as system where the average EPS rate is 7,000.
Increasing the queue size will not resolve issues where systems continuously exceed their license capabilities because the excess data is added to the end of the temporary queue where it must wait to be processed. The larger the queue, the longer it will take those queued events to be processed by the appliance. The key to dealing with excess data is to have a system with enough license room to balance spikes in the event or flow rate to quickly process the queued data.
Burst handling for excess events and flows helps the system deal with spikes in data and prevents dropped event or flow data. The best way to deal with spikes in data is to ensure that your deployment is properly sized for the event and flow rates in your network. If your system is continuously over license, administrators will repeatedly receive system notifications about being over license. In situations where your system is continually going over license, you review the QRadar Troubleshooting System Notifications Guide, contact an IBM Sales Representative or discuss your notifications with IBM Support.
Where do I find more information?
If you have additional questions or some of this content is not clear, you can see the QRadar forum or contact customer support:
More support for:
IBM Security QRadar SIEM
Software version: 7.1, 7.2
Operating system(s): Linux
Software edition: All Editions
Reference #: 1687020
Modified date: 25 November 2014
Translate this page: