IBM Support

Checking and resetting the vwlog sequences in the Process Engine's database

Troubleshooting


Problem

Process Engine (PE) 4.5.1 and 5.0 and Content Platform Engine (CPE) 5.2.0 assign a sequentially incrementing sequence number to each new event log record it writes. In PE 4.5 and in PE 5.0, the sequence number is limited by the database schema to be what can be held in a 32-bit signed integer, so a total of 2.1 billion (2147483647) Process Engine "event log" records can be written into a vwlog table before problems are seen. In CPE 5.2.0, the database schema will let the PE write event logs past the 2.1 billion limit but Case Analyzer (CA) will be unable to process any event logs past the 2.1 billion limit.

Symptom

In PE 4.5 and PE 5.0, when the 2.1 billion limit is reached, event logs can no longer be written because the sequence counter won't allocate numbers larger than that limit, and that can cause several issues.

  1. First, the Process Engine will retry the failed event log writes with a pause of a few seconds between each retry. This will be seen as a significant performance slowdown. The event log record cannot be written. But after two retries, the process engine will skip the event log write and continue.
  2. Second, if Case Analyzer (CA) or Process Analyzer (PA) is being used, or if Tracker is being used, the lack of event log records will hurt their ability to function properly.

In CPE 5.2.0, the PE will continue to write event log records because the sequence will allocate numbers up to 9999999999. So basic Process Engine work will continue without problems. But CA will be unable to process any event logs past the 2.1 billion limit.

Cause

In PE 4.5.1 and PE 5.0, once the sequence counter goes beyond what can be held in a 32-bit signed integer, the faults and slowdowns happen. This is due to the Process Engine code using 32-bit integers for this processing.

In CPE releases prior to CPE 5.2.1.2, once the sequence counter goes beyond what can be held in a 32-bit signed integer, the CA software can no longer process events due to 32-bit integer limitations.

Diagnosing The Problem

Note that the following symptoms are what will be seen once you have actually hit the limit. The purpose of this technote is to document a procedure for detecting and handling the problem before the limit is reached. That procedure is in the "Resolving the problem" section of this note.

In PE 4.5.1, once you've reached the limit, the Process Engine elog may contain errors such as:


2014/03/27 00:00:00.751 121,1,1455 <fnsw> VW/Process  (17891552.1.56 0x11100e0.1) ...
VW (vwtime) (Region=1): ISI Exception (Wob  447EF9FA155D11E3BD0100000A575A12) : [Err=790105af]  => [Err=790105af]

Error 790105af can be decoded with vwmsg and it is an Oracle ORA-01455 error which means that a value (the sequence number in this case) does not fit into the data container (a 32-bit signed integer in this case).

In another case, Component Manager can't start but no error is seen on Oracle Server but if the PE's "database" tracing is enabled, an 0x790105af exception from the database can be seen in while trying to write a record to a vwlog table.

In PE 5.0, once you've reached the limit, the Process Engine log will contain errors such as:

"Cannot execute statement, SQL: "SELECT VWLOG1_113SEQ.NEXTVAL FROM DUAL"
ORA-08004: sequence VWLOG1_113SEQ.NEXTVAL exceeds MaxValue and cannot be instantiated "

Another symptom that may be seen is an error in the PE logs indicating that it has reached the maximum value of  '2147483647' for the sequence value in the Oracle Event Log table.

In CPE 5.2 releases before CPE 5.2.1.2, when the F_SeqNumber value goes past the limit, CA will see the F_SequenceNum as a large negative number and CA will fail with an error similar to this:
2015/09/23 15:15:56.422-0700 filenet.4620 5f3d5f3d [Error] Unexpected exception occurred while collecting events for store CAStore ; Exception: java.lang.IllegalStateException: Sequence number went backwards in log agent. Previous: 2147483647, New: -2147483648

Resolving The Problem

The CPE 5.2.1 release supports 64-bit event log sequence numbers. The Case Analyzer service and other Workflow System (Process Engine) related services have been enhanced to use and process 64-bit event log sequence numbers as of CPE 5.2.1.2 and above. So a comprehensive solution is available with CPE 5.2.1.2 and above.

The database schema and process changes required by the comprehensive solution cannot be retrofitted into release prior CPE 5.2.1.

If you move up to CPE 5.2.1.2 (or newer), this whole issue of vwlog sequence number problems disappears.

The following manual procedure helps identify if the system is close to encountering the issue and lets you reset the sequence value so that Process Engine can continue to function on PE 4.5, PE 5.0, and CPE 5.2.0.

IMPORTANT: Note that if you catch the problem before you exceed the limit, it's relatively straightforward to resolve the problem. But we still strongly recommend that you contact L2 and work with them to be sure everything is done correctly, because the resolution involves directly adjusting the PE's database schema and mistakes can have grave consequences and be very difficult to fix.

IMPORTANT: Note that if you are using Case Analyzer and you are going to reset the VWLog sequences as described in this technote, you must also follow the procedures defined in this related CA technote: http://www.ibm.com/support/docview.wss?uid=swg21962784

The event log tables have names like this: VWLog1_427 or VWLog55_113.
The first number (after the "VWLog" and before the underscore) is the region number.
The second number (after the underscore) is a unique number for that region.

The event table names can be found by looking at vwtool's logconfig command output.
In the logconfig output, each event log lists a Physical table name: and that identifies the tables that need to be checked.

Check your event log tables

For each of your event log tables, you'll need to check to see how close it is to the 2.1 billion limit.

For the remainder of this document, the database queries will refer to a vwlog table called VWLog1_427. You will substitute the name(s) of your vwlog table(s) into these commands.

An SQL Query such as select max(F_SeqNumber) from VWLog1_427 can be used to see where newest event log rows are being inserted. The max(F_SeqNumber) value is the largest value in the table and that will be the most recent F_SeqNumber used. Most event logs will be creating new entries with an F_SeqNumber value well below 1 billion.

You might execute the select max(F_SeqNumber) query once a week for a month to get a feel for how fast your event logs are growing. This will give you an estimate on when you'll hit the 2.1 billion limit.

Then, you can continue to monitor the event log tables on a schedule that makes sense based on how quickly you're approaching the 2.1 billion limit.

If one of the event logs is approaching the 2.1 billion limit (based on the max(F_SeqNumber) value), this is a problem and you'll need to reset that sequence (or move up to CPE 5.2.1.2 or newer software).

To resolved the problem, you will need to get rid of any older event log records with small F_SeqNumber values to make room for new event log records. If you're on a CPE 5.2.x release prior to CPE 5.2.1.2 and you're using CA, if you have any F_SequenceNum values above the 2.1 billion limit, you will need to follow extra procedures to "resequence" those values to lower F_SequenceNum values so that CA can process them. And then you must reset the VWLog sequence to allocate smaller values.

Getting rid of older event log records

Ideally, normal vwlog maintenance using the pelog tool will have already pruned all of the very old event log records (see the vwlog or pelog utility for details on pruning event logs). For example, if you're currently allocating new event log records with an F_SeqNumber value of 2,000,000,000 (2 billion), it's likely that all event log records with F_SeqNumber values below 1 billion are no longer needed and should have been pruned already.

SQL queries can be used to verify that an F_SeqNumber range is clear (empty) in a vwlog table.

First, we will make sure the first 200 million slots are available.

To make sure the first 200 million slots are available:
select count(*) from VWLog1_427 where F_SeqNumber < 200000000
If the count is non-zero, that range is not clear and you will need to make it clear.

Ideally, pruning with pelog will have cleared this area by the time you're approaching the 2.1 billion limit. If not, you should prune with pelog (or with vwlog if you're running PE 4.5.1) to clear all event log records with F_SeqNumber values below two million.

In an emergency, and ONLY if Process Analyzer and/or Case Analyzer are NOT being used, you can delete event log records directly with SQL statements. Process Tracker will lose the ability to display some information, but the Process Engine will continue to function.

To clear the first 200 million event log records with an SQL statement:

If there are a small number of entries in that range, they can be deleted with one SQL command:
Delete from VWLog1_427 where F_SeqNumber < 200000000;

Otherwise, the transaction may be too large, and the rows should be deleted using multiple SQL statements, each deleting a smaller chuck of rows (such as 10 million at a time), like this:

DELETE FROM VWLog1_427 WHERE F_SeqNumber < 10000000;
commit work;

DELETE FROM VWLog1_427 WHERE F_SeqNumber < 20000000;
commit work;

DELETE FROM VWLog1_427 WHERE F_SeqNumber < 30000000;
commit work;

And so on up to
DELETE FROM VWLog1_427 WHERE F_SeqNumber < 200000000;
commit work;

Note: the "commit work" statement may not be necessary because sometimes the command line tool automatically commits the transaction on each separate statement.

After having deleted the lower event log records, the following query should return 0, meaning there are NO event log records remaining below 200 million:
select count(*) from VWLog1_427 where F_SeqNumber < 200000000

Now that we have cleared out the lower 200 million slots, if CA is being used, we may need to re-sequence some of the existing event log records.

Resetting the VWLog sequence

Now we can reset the sequence so that the Process Engine starts putting vwlog records in the lower slots rather continuing to grow and hitting the 2.1 billion limit.

IMPORTANT: Shut down the PE software, the CA software, and any component manager instances that may be running, so that nothing is using the PE's database.

Note that the name of the sequence for a vwlog table is the name of the vwlog table itself with SEQ appended at the end. So for our example event log table named VWLog1_427, its sequence is named VWLog1_427SEQ.

The following examples reset the starting sequence value to 1000.

To reset the sequence for Oracle:
drop sequence VWLog1_427SEQ
CREATE SEQUENCE VWLog1_427SEQ START WITH 1000 INCREMENT BY 1 MAXVALUE 2147483647 CACHE 100 ORDER

To reset the sequence for DB2:
drop sequence VWLog1_427SEQ
create sequence VWLog1_427SEQ as AS NUMERIC(10,0) start with 1000 MAXVALUE 2147483647 increment by 1 cache 100 order

To reset the sequence for SQL Server:
dbcc checkident ('VWLog1_427', RESEED, 1000);
GO

Do not restart the software yet.

If you are using CPE 5.2 and CA, and have F_SeqNumber values above 2.1 billion

On PE 4.5 and PE 5.0, the database will not let the PE software allocate numbers above the 2.1 billion limit. But on CPE 5.2, the database will allow the CPE to write event log records above the 2.1 billion limit... that is, with F_SeqNumber values > 2147483647.

If you have event log records with F_SeqNumber values larger than 2147483647 and a CA release earlier than CA 5.2.1.2 is being used, you will need to contact L2 and we will need to devise a plan that will involve "re-sequencing" the F_SeqNumber values for the event log records that CA has not yet processed so that they reside in the lower range you have just opened up. This is required so that the pre-5.2.1.2 CA software can process those event log records.

And this task is not easy because the underlying database systems do not generally allow us to simply update the F_SeqNumber column values when sequences are involved.

IMPORTANT: This task is beyond what we want customers doing on their own and we require that L2 be contacted and be involved in the planning and implementation of this part of the recovery operation.

Once all of the event log records that CA couldn't process have been re-inserted with smaller F_SeqNumber values, you can continue on to the next step.

Now Restart the Software

If you're using CA, make sure you have also reset the CA so it will reprocess the smaller F_SeqNumber event log records using the process in this CA technote: http://www.ibm.com/support/docview.wss?uid=swg21962784

Now, you're ready to restart the PE software, the CA software, and any component manager instances. The CA should begin to process re-sequenced event log records at this point.

At this point, there will be no problems until the Process Engine runs into an already-existing event log record. We know things are clear up to at least 200 million because that's the area we've cleared out.

Continued Vigilance – Keep checking to make sure you are not approaching the next limit.

Once you have reset the sequence, the query to monitor where the Process Engine is currently allocating new event log records is different. After a reset of the sequence, the new limit is the top of the most recently "cleared out" area. Because you have cleared out the first 200 million rows in the event log table, to see where the Process Engine is currently allocating, you need to see the largest F_SeqNumber value that is below 200 million:
select max(F_SeqNumber) from VWLog1_427 where F_SeqNumber < 200000000;

Do the above query daily or weekly until you have a good estimate on how soon you will be closing in on the new 200 million limit.

When the Process Engine is getting close to allocating the 200 millionth event log, you will need to clear out the next chunk of space in the event log table so that more log records can be added. Again, ideally, pruning will have cleared out this next area already.

For this procedure, you will need to clear out another 200 million chunk of event log records. The next area is 200 million to 400 million.
The query to check if that area is clear is:
select count(*) from VWLog1_427 where F_SeqNum > 199999999 and F_SeqNumber < 400000000;

If the above query returns a count of 0, there are no records in the 200 million to 400 million range and the Process Engine will be able to write log records up to 400 million without any issues.

If there are event log records in that area, event log pruning should be done to clear out that region. Or, if Case Analyzer and Process Analyzer are not being used, the "emergency deletes" can again be used in order to clear space in that event log table.

The emergency deletes of the next chunk of event log records (the range from 200 million to 400 million) would use SQL statements like the following:

If there are only a few record in that range, you could delete them all at once with this command:

Delete from VWLog1_427 where F_SeqNum > 199999999 and F_SeqNumber < 400000000;

Otherwise, delete them in smaller chunks with commands like this:

DELETE FROM VWLog1_427 WHERE (F_SeqNum > 199999999 and F_SeqNumber < 210000000);
commit work;

DELETE FROM VWLog1_427 WHERE (F_SeqNum > 199999999 and F_SeqNumber < 220000000);
commit work;

DELETE FROM VWLog1_427 WHERE (F_SeqNum > 199999999 and F_SeqNumber < 230000000);
commit work;
...
DELETE FROM VWLog1_427 WHERE (F_SeqNum > 199999999 and F_SeqNumber < 400000000);
commit work;

To verify the second 200 million slots are available in the event log table, run this query:
select count(*) from VWLog1_427 where F_SeqNumber > 199999999 and F_SeqNumber < 400000000

Once the above query returns 0, we know that the Process Engine will be able to add event log records at least up to F_SeqNumber values of 400000000.

Now, you'll need to make sure you don't let the Process Engine run into the 400000000 limit.

Because the limit has changed, the monitoring query has to be adjusted as follows:
select max(F_SeqNumber) from VWLog1_427 where F_SeqNumber < 400000000;
We want to know the largest F_SeqNumber value below 400 million.

You would then use the above query to monitor the F_SeqNumber values to see how quickly you're approaching the new 400 million limit. When you get close enough to that limit, you will need to clear out the next chunk (400 million to 600 million) using delete statements similar to the examples above, adjusting the ranges as necessary. And then you will begin to monitor how quickly you're approaching the new 600 million limit.

To check where you are currently allocating, the query is always
select max(F_SeqNumber) from VWLog1_427 where F_SeqNumber < nnn;
with nnn being the top of the most recently cleared out chunk of the VWLog1_427 table.

[{"Product":{"code":"SSTHRT","label":"IBM Case Foundation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Process Engine","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"},{"code":"PF016","label":"Linux"}],"Version":"5.2;5.0;4.5.1","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 June 2018

UID

swg21960771