Security Guardium - Identifying and resolving common sniffer problems with the Buffer Usage report.
How can I use the Guardium Buffer Usage report to identify common problems with the sniffer?
What columns in the report are relevant?
How can I alert on common sniffer problems?
How can I resolve common sniffer problems?
The Guardium sniffer also known as inspection-core is analyzing and parsing all traffic coming into a collector and logging it in the internal database. The buffer usage report displays a wide range of performance information for the sniffer and system taken every minute. The report can be found in:
v9 - Guardium Monitor > Buffer Usage Monitor
v10 - Reports > Guardium Operational Reports > Buff Usage Monitor
Problem - Buffer usage process not running
Key columns in the report - Timestamp.
The buffer usage process is running on the system to monitor the sniffer and populate this report. If this process is not running, the report will not be populated. A common cause of this is the internal database has previously filling up to 90% and buffer usage process not being restarted afterwards.
Check the timestamp column of the report and the run time parameters. Does the report have one row every minute up to the current time? If not the buffer usage process should be restarted. Follow the steps here to restart the process.
Problem - Analyzer Queue Overflow
Key columns in the report - Analyzer Rate, Analyzer Queue, Flat Log Requests.
The analyzer rate is the amount of incoming data into the sniffer. The value of analyzer rate can be different depending on the appliance and traffic. There is no generic value that will be a problem and no generic 'best practice' value.
The analyzer queue is the amount of traffic queued for analysis. This value will probably be going up and down meaning the queue is growing an being processed. If the queue is constantly high it is very likely to cause a problem.
The analyzer part of the sniffer has a circular buffer. When the queue is full any incoming data will be dropped. The amount of dropped data from the last minute is logged in flat log requests. If there was data dropped by the analyzer in the last minute, the flat log requests will increase. Increasing flat log requests is the key indicator of analyzer queue overflow. For a healthy sniffer it should not be increasing.
Alerting on Analyzer Queue Overflow
Analyzer queue is tracked in the Unit Utilization Level and Deployment Health View (v10.1 and above). A red status on analyzer queue in these views indicates a likely analyzer queue overflow. The buffer usage report on the individual collector can then be checked to confirm.
To alert directly on flat log requests you can use the technote here.
Resolving Analyzer Queue Overflow
If the analyzer queue is overflowing it means the traffic is coming into the appliance faster than the analyzer can process it. Improvements in the latest sniffer patches will help, but reducing the amount of traffic to the collector is often the best solution, for example by:
- Using Ignore S-TAP Session action on more traffic in the policy
- Moving S-TAPs to a less loaded collector
- Load balancing traffic between more than one collector
- Adding more collectors to the environment
Problem - Logger Queue Overflow
Key columns in the report - Logger Rate, Logger Queue, Mem Snif, TID (or PID)
The logger rate is the amount of traffic being logged into the internal database.
The logger queue is the amount of traffic queued for logging. This is likely to be going up and down, meaning the queue is growing and being processed. If the queue is constantly high it is very likely to cause a problem.
The logger part of the sniffer has a non-circular buffer that is held in the sniffer memory. If the logger queue increases, the amount of memory used by the sniffer (mem snif) will increase. Once the memory has been used by the sniffer, it will not be released until the sniffer restarts. This means you will see the mem snif column increase as logger queue increases, but never decrease unless the sniffer restarts.
The sniffer memory has a maximum limit. For 64bit appliance it is 1/3 of the total appliance RAM by default. This can be configured in the CLI with support store snif_memory_max. For 32bit appliance it is 2.5GB.
If the logger queues stay high, eventually the maximum memory will be reached. At that point the sniffer automatically restarts and the data in the queues is dropped.
When the sniffer restarts the sniffer process ID TID (or PID) will change, indicating a new sniffer process has started.
High logger queue, mem snif reaching its maximum followed by change of TID means there has been a logger queue overflow problem.
Logger queue overflow is not the only possible cause of sniffer restarts. Sniffer can be restarted from the CLI or sniffer may be crashing.
Alerting on Logger Queue Overflow
Logger queue is tracked in the Unit Utilization Level and Deployment Health View (v10.1 and above). A red status on logger queue in these views indicates a likely logger queue overflow. The buffer usage report on the individual collector can then be checked to confirm.
To alert directly on a high number of sniffer restarts you can use the technote here.
Resolving Logger Queue Overflow
Reducing the traffic as for analyzer queue overflow will help to some extent, however the amount of data is not the most common cause of logger queue overflow. Reducing the amount of data logged with intensive logging actions in the policy will have more impact. Sniffer patches are more likely to resolve specific issues leading to high logger queues. Decreasing the workload on the internal database will also improve performance of the logger, for example by running Audit Processes on an aggregator where possible. If the logger queue overflow problem is correlated with a specific scheduled job, that jobs impact on database performance is the likely cause. Overall for logger queue issues you can:
- Install latest sniffer patch from fix central on the appliance
- Reduce amount of traffic logged with 'Log Full Details' or 'Alert per Match' policy actions. See configuring your policy to prevent appliance problems for more details.
- Investigate any scheduled jobs that correlate with logger queue overflow. Guardium support can assist if required.
If the above information is not able to resolve your problem, Guardium support can assist. If you need to open a PMR please attach:
- Support must gather sniffer issues.
- Detailed description of the problem with reference to the information in this technote.
More support for:
IBM Security Guardium
Software version: 9.0, 9.1, 9.5, 10.0, 10.0.1, 10.1, 10.1.2
Operating system(s): AIX, HP-UX, Linux, Solaris, Windows, z/OS
Software edition: All Editions
Reference #: 1994083
Modified date: 12 February 2017