IBM Support

QRadar: HA synchronization progress resets to 0%

Troubleshooting


Problem

When doing a full Data Replication Block Device sync with high-availability (HA) in QRadar, there might be a situation that causes the synchronization progress to reset to 0%. This does not mean the synchronization is reset and needs to start over. It is a temporary indicator of percentage until synchronization percentage is recalculated and it is not an indication of an actual problem.

Symptom

When monitoring the total Data Replication Block Device synchronization progress, the overall progress might have progressed to some higher value. At some point, this progress resets back to 0% and appears to start again.

Cause

Various things might cause the progress percentage to reset, such as a full deployment, lost link, or a spontaneous network outage. 

Resolving The Problem

When you notice the progress resets back to 0% for some reason, this does not mean that the full sync started over. It picks back up from the point where the previous sync stopped. To verify this, run cat /proc/drbd. You see an output similar to this:
Sun Mar 24 11:14:32 +00 2019   0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----    
ns:0 nr:1089876000 dw:1089866364 dr:0 al:0 bm:0 lo:41 pe:42 ua:41 ap:0 ep:1 wo:d oos:18121311576
[=======>............] sync'ed: 43.3% (17696592/31165372)M    finish: 104:46:52 speed: 48,024 (57,796) want:
 102,400 K/sec
Example 1
In example 1, sync shows at 43.3% complete. The key thing to look at is the out of sync (oos) field. This indicates how many kilobytes are "out of sync". This translates to 17696592 MB, which is the first number after the "sync'ed" percentage. The second number translates to the total kilobytes left to transfer when it begins.
In the scenario where it drops to 0%, the /proc/drbd output can look similar when it resumes:
Sun Mar 24 11:21:03 +00 2019   0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r---
--    ns:0 nr:153775024 dw:153770416 dr:0 al:0 bm:0 lo:21 pe:43 ua:20 ap:0 ep:1 wo:d oos:17966538772
 [>....................] sync'ed: 0.9% (17545448/17694024)M    finish: 85:12:25 speed: 58,568 (55,504) want:
 16,040 K/sec
Example 2
A few key things to look at in example 2. The oos field picked back up where the previous sync left off. If it started over from the beginning, it would be set to 31913340928. Also, you can see the two numbers after the sync % have changed. It initially started this latest sync with 17694024 MB to synchronize. There is 17545448 MB left, so 0.9% has completed. As you can see, the progress % indicates sync dropped back to 0%, but it resumed from the previous sync state and the new percentage is based on what is left to synchronize.
To see this Data Replication Block Device information from an earlier time, you can review the systemStabMon logs. These are located under /var/log/systemStabMon/YYYY/MM/DD/drbd.log. All days other than the current are compressed as a .gz file can be viewed with the zless command. In the drbd.log file, you can view a snapshot of this output throughout the day and track down to the timeframe when the synchronization was reset. From here, you can verify that the oos field did not reset back to the original value and that it picked back up at the state before the progress was reset.

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
15 December 2022

UID

ibm10878206