IBM Support

WinCollect: Let's Talk About Log Source Event Rates & Tuning Profiles (Updated)

Question & Answer


Question

This article discusses how to tune WinCollect log sources and what the specific tuning values mean for administrators meeting event collection requirements.

Answer

What are tuning profiles and how do they work in WinCollect?


A tuning profile defines the EPS rate that a specific WinCollect log source can collect. A tuning profile can be applied to a log source at any time from the user interface of QRadar or defined at installation time when the installer automatically creates a log source. When an administrator deploys WinCollect on a system, they can tune the log sources for specific event rates to ensure that the agent does not fall behind in event collection.

Local
The default tuning for local WinCollect installations is set to the maximum value of 5,000 EPS by default. 5,000 EPS is the limit of performance for a WinCollect agent. Administrators who have high event rate servers, such as Domain Controllers can install WinCollect agents locally without the need to tune the agent configuration. Administrators who want to determine the events per second (EPS) being generated by a Windows host can review the Event Log Report tool on our GitHub page.

Remote Polling
The default tuning supports approximately 40 events per second (EPS), which is suitable for polling most endpoint systems, such as employee workstations. The maximum tuning is 2,500 EPS for remote polling, which is split across all Windows hosts that are being remotely polled. For example, if you want to remote poll 300 endpoints, you should review the EPS being generated on each host to verify you do not exceed the 2,500 EPS limit for remote polling from a single agent. Administrators who require more than 2,500 EPS can deploy multiple WinCollect agents to meet the collection requirements for their Windows environments. Administrators who want to determine the events per second (EPS) being generated by a Windows host for remote polling can review the Event Log Report tool on our GitHub page.

Tuning Parameters
There are two tuning parameters that can be adjusted in the QRadar log source to adjust the approximate EPS collection capability of the WinCollect agent:

  1. Event Rate Tuning Profile - This field defines the number of events polled when the agent reads from the event log. This parameter is tuned for each log source.
  2. Polling Interval(ms) - Defines the frequency with which the local/remote event log is polled for events. The lower the polling interval, the faster the WinCollect agent checks the local or remote Windows system for event updates.

    Table 1: EPS rates possible for the each of the tuning parameters. (Click to enlarge)

    Note: Do not configure the Polling Interval field to a value lower than 300 milliseconds.

Question: Why does the documentation state the default tuning is approximately 40 EPS?
If the administrator does not adjust the Polling Interval field and leaves the tuning at Default (Endpoint), then the EPS maximum is approximately 40 EPS. If the polling interval is adjusted to make API requests at a rate of 1000 milliseconds (ms), instead of the default 3000 ms, then you can increase the EPS for the agent to approximately 120 EPS from the default tuning profile. It is possible to tune the polling interval slightly lower, but the values represented in the documentation have been tested in a lab environment. QRadar Support recommends that administrators do not attempt to tune their polling intervals below 300 ms. Tuning your polling interval lower than 300 ms can cause performance issues on the Windows host.
 
For the default polling interval of 3000 ms, the approximate Events per second (EPS) rates attainable are as follows:
  • Default (Endpoint): 33-50 EPS
  • Typical Server: 166-250 EPS
  • High Event Rate Server: 416-625 EPS

For a polling interval of 1000 ms, the approximate EPS rates are as follows:
  • Default (Endpoint): 100-150 EPS
  • Typical Server: 500-750 EPS
  • High Event Rate Server: 1250-1875 EPS
 

Getting started with WinCollect remote polling

WinCollect can use either the MS-EVEN or MS-EVEN6 protocol to query for events. Each time the WinCollect agent needs to poll a remote Windows host the agent will create a channel to complete the query for every event log type being collected. The number of remote hosts polled can impact performance when too many channels are opened simultaneously. Too many queries per second can cause remote procedure call (RPC) errors when WinCollect attempts to remotely poll a Windows host. The WinCollect agent must open a channel to read the events from the remote event viewer based on the log source configuration set by the administrator. It is recommended that administrators do not exceed 30 channel queries per second with a WinCollect agent.

What is a channel?
A channel is created for each path in the remote event viewer when a query is created. For example, to poll the Application, System, and Security log, the WinCollect agent needs to open three channels to the remote host. Each log type that is specified within the log source or within an XPath query creates a channel when an endpoint is polled as a path needs to be created to that location, which creates a channel.


Figure 1 In this example, two log types are selected and will create two channels per remote host that is polled.
XPath query
Figure 2: In this XPath example there are four 'Select Path' variables, which will create four channels per host that is remotely polled.


How to calculate the number of queries per second
Administrators who experience collection issues with remote polling should ensure that the agent is creating 30 queries per second or less to prevent RPC errors when remote polling. The following formula can be used to determine the number of queries made per second from the WinCollect agent:

For example, a WinCollect agent is collecting Application, System, and Security events from 300 endpoints every 10 seconds. In this example, the number of queries per second exceeds the recommended limit of 30 queries per second. When the query limit is exceeded it can generate Syslog LEEF messages from the WinCollect agent for 'RPC server is unavailable'.




The error message
<13>Apr 17 10:54:41 myhostname.com LEEF:1.0|IBM|WinCollect|7.2|7|src=myhostname.com dst=10.10.10.10 sev=5 log=Device.WindowsLog.EventLogMonitor msg=Failed to open event log myhostname.com [myhostname.com:Security]; will try again in approx 60 seconds. Reason: Error code 0x06BA: The RPC server is unavailable. 


How to reduce the number of queries per second
The best method to reduce the number of queries per second for a WinCollect agent is to edit the log source and extend the polling interval. Tuning the collection frequency of the remote polling parameters is the easiest method for administrators to ensure that they do not exceed the maximum query interval for an individual WinCollect agent. Agents that exceed 30 queries per second might can experience RPC error messages from the Windows host when they attempt to remotely poll for data.


Table 3 outlines adjust how extending the polling interval can allow a single agent that polls a large number of remote hosts to avoid RPC error messages and be below the 30 queries per second recommendation.

Table 3: An example of how adjusting the polling interval alters the number of queries per second.

NOTE: A Google docs spreadsheet of Table 3 is available here: https://docs.google.com/spreadsheets/d/15CyAq9bnZgwd50_8s5NwTIyWt1knlPpHFpdQUvIJLJo/edit?usp=sharing.

My agent reports a "WARN: Reopening event log" messages. What does this mean?

By default, WinCollect is tuned for Workstations and basic event collection. If the log source for the agent is not tuned when collecting on a Windows server or Domain Controller (DC), then the agent can quickly fall behind on its event collection. When this occurs, the agent generates the following Syslog LEEF message and sends it to the QRadar appliance that is configured at your Status Server:

<13>Sep 22 09:07:56 IPADDRESS LEEF:1.0|IBM|WinCollect|7.2|7|src=MyHost.example.com dst=10.10.10.10 sev=4 log=Device.WindowsLog.EventLog.MyHost.example.com.System.Read msg=Reopening event log due to falling too far behind (approx 165 logs skipped). Incoming EPS r.avg/max = 150.50/200.00. Approx EPS possible with current tuning = 40.00


How do I view agent status events in QRadar?
The Status Server is responsible for forwarding messages related to Agent status to QRadar. These LEEF messages can be easily viewed from the QRadar user interface from the WinCollect agent list using the Show Events button. These events are written in the C:\Program Files\IBM\WinCollect\logs\WinCollect_Device.log on the WinCollect agent and are also sent to the QRadar appliance as a LEEF syslog message.

Procedure
  1. Log in to QRadar as an admin user.
  2. Click the Admin tab.
  3. Click the WinCollect icon.
  4. Select a WinCollect agent from the agent list.
  5. Click the Show Events icon.

    Fig 5: The Log Activity tab is opened to show agent specific status events.
     
  6. If you do not see any events, this could indicate that you did not include a Status Server address when you installed WinCollect or there is an issue with the WinCollectSvc.exe on the Windows host. The LEEF events are only generated by WinCollect agents that were installed with a Status Server address for forwarding agent messages.


How do I view agent status events from the WinCollect agent?
The administrator can also review the WinCollect_Device.log to view warning messages. For example:

WARN  Device.WindowsLog.EventLog.IP Address.Security.Read : Reopening event log due to falling too far behind (approx 337133 logs skipped). Incoming EPS r.avg/max = 77.91/226.00. Approx EPS possible with current tuning = 40.00 

How do I tune my log source in the QRadar user interface?


Log source tuning is done on a per log source basis. When an administrator is tuning log sources that remotely poll, they should ensure that they do not exceed 2,500 EPS on the agent. A locally installed WinCollect agent can support up to 5,000 EPS when collecting from a high event rate server, such as a domain controller.

The administrator can tune a WinCollect log source by adjusting the default log source configurations. The example below is of the default tuning and is intended to guide administrators to the location of these tuning parameters.
 

  1. Log in to QRadar as an Administrator.
  2. Click the Admin tab.
  3. Click the Log Sources icon.
  4. Edit the configuration of the WinCollect log source that is falling behind.
  5. Edit the Event Rate Tuning Profile, if required.
  6. Edit the Polling interval (ms), if required.
    Note: It is important that administrators never tune below 300ms to prevent stability issues. In edge cases when tuning, it is better to adjust the Event Rate Tuning Profile higher, than it is to push the event collection to a value below 300ms.
  7. Save the log source configuration.
    After the log source is saved, administrators need to wait for the Configuration Polling Interval of the agent to elapse. The QRadar appliance sends a new configuration to the remote WinCollect agent and the tuning parameters are implemented.
  8. The administrator can wait a few minutes, then reexamine the device log on the WinCollect agent to determine if the next warning message includes the updated Approximate EPS possible with current tuning.

    For example, the agent status message from the QRadar user interface:
    <13>Sep 22 09:07:56 IPADDRESS LEEF:1.0|IBM|WinCollect|7.2|7|src=MyHost.example.com dst=10.10.10.10 sev=4 log=Device.WindowsLog.EventLog.MyHost.example.com.System.Read msg=Reopening event log due to falling too far behind (approx 165 logs skipped). Incoming EPS r.avg/max = 150.50/200.00. Approx EPS possible with current tuning = 40.00

    Or from the WinCollect_Device.log on the Windows host:
    WARN  Device.WindowsLog.EventLog.IP Address.Security.Read : Reopening event log due to falling too far behind (approx 337133 logs skipped). Incoming EPS r.avg/max = 217.91/326.00. Approx EPS possible with current tuning = 200.00 


Results
After you have tuned your WinCollect log source, you will still receive the warning messages as the new tuning reduces the event log skipped messages. In the example event, the system skipped 337133 logs and new messages after the configuration should start reducing this number.

For example:
(approx 336112 logs skipped)
(approx 334215 logs skipped)
(approx 333487 logs skipped)
(approx 332239 logs skipped)
 ....
 ....
(approx 200 logs skipped)


The logs skipped value will continue to drop until your system catches up with the event log. At this time, the warming message will stop for the WinCollect agent.


 

Troubleshooting: Reopening Event Log Due to Falling too Far Behind

In the case where either the number of logs skipped does not decrease, then the administrator might be required to delete the bookmark file for the WinCollect agent. The bookmark allows WinCollect to keep the state of reading the event log in cases where the Windows host is restarted and contains the spillover cached events when a network issue occurs between the WinCollect agent and the QRadar appliance, as well as events being held for sending at a later time by destinations that use the 'Store & Forward' feature.


WARNING: This procedure will reset the bookmarks the WinCollect agent keeps for all log sources. Any agents that have fallen behind in their log collection will have their bookmarks reset to the time when the WinCollect service restarts on the Windows host. This means that any events in the cache will be lost and the next poll for events will be for the newest event in the event log. There is no way to restore the bookmark to collect the old logs after the persistence files are deleted.


Before you begin
Before you start this procedure, read the WARNING above. If you are uncomfortable with this procedure or want to talk about the implications of this action, you should contact QRadar support.

Procedure
If the WinCollect agent has fallen too far behind or the remote event log has been cleared or rolled over due to an extended outage, then it might be required to reset the bookmark to ensure that events are sending.
  1. Log in to the Windows system that is hosting the WinCollect agent.
  2. Click Start > Run > Services.msc.
  3. Locate the WinCollect service and click Stop.
  4. Click Start > Run and type, %Program Data%\WinCollect\Data.

    Note: If you are using an older Windows operating system, the directory path might be %ALLUSERSPROFILE%\WinCollect.
  5. There are two folders that the customer should be aware of. These two files are:

    C:\ProgramData\WinCollect\Data\Events
    C:\ProgramData\WinCollect\Data\PersistenceManager


    It is recommended that the administrator can zip these two files first before taking any action.
  6. To reset the bookmark file, the customer can delete the PersistenceManager file, which resets the last record

  7. Restart the WinCollect service.

    When the WinCollect agent restarts, it will identify that the PersistenceManager directory is missing and that it needs to be recreated. The PersistenceManager directory is created and the first poll to the event log and creates new bookmark files for the event logs (System, Application, Security, etc). The new bookmarks are set to the newest event in the associated event log. Any events not in the spillover cache "Events" directory will not be forwarded by WinCollect.
  8. The administrator should then review to see if the WARNING messages are now counting down for the number of logs skipped.

    For example:
    WARN  Device.WindowsLog.EventLog.IP Address.Security.Read : Reopening event log due to falling too far behind (approx 330023 logs skipped). Incoming EPS r.avg/max = 77.91/226.00. Approx EPS possible with current tuning = 300.00
  9. If the WARN messages are still not counting down, the WinCollect service needs to be stopped again before proceeding to the next step.
  10. At this point, it might be required for the administrator to investigate the C:\ProgramData\WinCollect\Data\Events directory. This directory stores cached events that have not yet been forwarded to the QRadar appliance. It is possible that a corrupted event is creating an issue for WinCollect.

  11. There are two options that the administrator has at this time:
    1. Selectively delete the largest files.
      The administrator can review the size of the files in this directory. A guaranteed work around in most cases is to delete the Events directory, then restart the WinCollect service. However, this removes all cached events waiting to be sent to QRadar. When support reviews these issues, it is common to look for the largest files within the Events directory and remove these, then restart the WinCollect service.

      For example, when reviewing this directory, there is a file that is 1GB is size. This file is typically deleted to see if the other events in the cache will start sending after the service is restarted.
       
    2. Remove all cached events.
      If the administrator does not care about the old events in the cache, the entire Events directory can be deleted, which will resolve any corrupted event issues or problems. However, this means that all events in the existing cache are lost.
       
  12. Restart the WinCollect service.
    1. If the administrator cleared a large file, the WARN message should be reviewed to determine if the agent is still significantly behind. If this is the case, then repeat step 11a.
    2. If the administrator cleared the entire Events directory, the WARN messages should no longer be written to the device log as the backlog of events has been cleared. New logs should be arriving at QRadar.
  13. If the issue persists after all steps have been completed, then the administrator should contact QRadar support for assistance.


Note: If you have issues with corrupted events and WinCollect (which can stop event collection), then we recommend possibly moving from WinCollect to the MSRPC protocol. WinCollect due to the API used can become stuck on a corrupted event, however, the MSRPC protocol will skip a corrupted event in the Windows event log and continue to receive data without interruption. For more information, see Agentless Windows Event Collection using the MSRPC Protocol.

[{"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"WinCollect","Platform":[{"code":"PF033","label":"Windows"}],"Version":"7.2","Edition":"All Editions","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
26 February 2021

UID

swg21672193