You see long query times on Business Process Definition (BPD) and TASK tables and your process server database tables are occupying too much disk space.
Resolving the problem
The information in this document can help keep your database from growing unbounded.
Completed business process definition instances are not deleted from the system automatically. After a business process definition instance is completed, the instance is typically no longer needed and, therefore, can be removed from the Process Server database. The IBM Business Process Manager, WebSphere Lombardi Edition, and Lombardi Teamworks products provide a stored procedure called LSW_BPD_INSTANCE_DELETE, which you can use to delete old instances. With Lombardi Teamworks V6.1 and later, this stored procedure clears out all runtime data that is associated with this instance in the following tables:
- Dynamic groups created for this instance from:
- Task associated with this instance from:
- BPD Instance data from:
Note: LSW_TASK_ACT is only in Lombardi Teamworks 6.x databases
Note: Alternatively, starting with IBM Business Process Manager V8.0.1, you can use the BPMProcessInstancesCleanup command to delete instances and tasks.
As a best practice to call the LSW_BPD_INSTANCE_DELETE stored procedure, complete the following steps:
- Query the LSW_BPD_INSTANCE table for all closed instances that fall within a date range.
- Input the resulting instances into the stored procedure.
You can get the date from the LAST_MODIFIED_DATETIME column in the LSW_BPD_INSTANCE table. The closed instances are those instances whose EXECUTION_STATUS is 2. For a full explanation of the EXECUTION_STATUS values that you can query, see the What do the EXECUTION_STATUS values represent document.
The following query shows the distribution of BPD instances and their status.
select code.NAME, COUNT(bpd.EXECUTION_STATUS)
from LSW_BPD_INSTANCE bpd right join lsw_bpd_status_codes code on code.STATUS_ID = bpd.execution_status
group by code.NAME
order by code.NAME
The following query shows the distribution of all tasks and their status.
select code.NAME, COUNT(t.STATUS)
from lsw_task t right join LSW_TASK_STATUS_CODES code on code.STATUS_VALUE = t.STATUS
group by code.NAME
order by code.NAME
Note: The Cleanup Utility, which is provided in the Lombardi Teamworks Admin Console, removes task data only; not all of the business process definition instance data. As you can see from the previous list of tables, the LSW_BPD_INSTANCE_DELETE stored procedure deletes both the instance and task data that is associated with the business process definition. Thus, it is a much more thorough way to clean out business process definition instances. If you are using stand-alone services, you also might want to run the Cleanup Utility after running the LSW_BPD_INSTANCE_DELETE stored procedure.
You might want to have your database administrator construct a recurring job that queries for, then deletes, the instances that you need to delete.
Answers to common questions and concerns
- Why is deleting old data necessary?
When an instance completes and all of its associated tasks are closed, future work is not possible with this instance. You cannot re-start it and assign it to someone or edit old work. When a user logs into the portal, various tables are queried to gather data on the active tasks for that user. This operation involves full table scans. Even if only 35% of the data is relevant, it is going to take a while to pull the tasks needed for that user. Thus, if the other 65% is deleted, there is less data to scan.
- Does this process affect historical data?
From your in box when you search for history items, only older ones are affected. When you run the delete queries, you can specify to only delete completed tasks that are older than 30 days. Any data that you really need either should be in the performance database or stored in some other system of record for auditing or other metrics.
- What happens if you do not delete the old closed instances?
- Slow performance on the portal occurs and potentially increases to an unusable state.
- Database size increases unchecked, which increases backup time and disk space usage.
- How often do you need to run the clean up stored procedures?
This frequency depends on how many instances are closed in a given time period (week, month), how large the data is in each task (large execution context, large document attachment), and how many tasks exist per instance. The two largest areas for growth are documents and execution context. If you have many documents and you need to reference them later, a third-party document management solution is worth considering. Execution context is all the data carried from task to task. If the solution has many variables with large amounts of data, this scenario quickly consumes database space. In this case, reviewing your solution is a good idea to reduce the amount of overhead in the application.
- When should the procedure be run?
Run the query during an off period or maintenance window. When thousands of instances and tasks are purged, this process might cause a strain on the LSW_TASK and LSW_BPD_INSTANCE tables. As these are core product tables, running a clean up job outside of normal business hours is a good practice.
- A runaway process caused hundreds or thousands of tasks or instances to be created, can I use this procedure to clean up these tasks?
Yes, you can use the stored procedure to clean up these tasks and instances safely. If the task has an event-based Undercover Agent (UCA) associated with it, the UCA makes up to five attempts to contact the task. After the last attempt, the UCA stops and does not execute any more. For this scenario, there is no action that is needed by the user.
- How can you determine what tables are storing the most data?
Contact your database administrator to determine the total size for the tables that are mentioned at the beginning of this document. The procedure is different for each database vendor. The following queries can provide you with a brief overview if the large growth is due to the number of rows or rows with large data.
Execution Context - large variable in tasks
select top 1000 snapshot_id, user_id, bpd_instance_id, subject, datalength(execution_context) as ObjectSize from lsw_task inner join LSW_TASK_EXECUTION_CONTEXT on lsw_task_execution_context.task_id = lsw_task.task_id order by ObjectSize desc
select top 1000 instance_name, lsw_bpd_instance.bpd_instance_id, datalength(data) as ObjectSize from lsw_bpd_instance inner join lsw_bpd_instance_data on lsw_bpd_instance.bpd_instance_id = lsw_bpd_instance_data.bpd_instance_id order by ObjectSize desc
- Is there a way to archive data rather than deleting data?
There is no out-of-the-box method to archive data to a separate database or table structure. Before running the delete process, you can copy the data into a custom built table outside of the product schema. There currently is an enhancement request for archiving data.
- Are there links to other performance related articles and best practices?
- Webcast replay: WebSphere Lombardi Edition - Developing for Performance and Scalability
- Collapsing System Lane Activity
- A looping or run away business process definition (BPD) or service might occur in IBM Business Process Manager (BPM) In the event that a loop occurs, the stored procedure can delete the data. This article contains some SQL to find the instances and the tasks of the looping event.
|Business Integration||IBM Business Process Manager Standard||General||AIX, Linux, Solaris, Windows||8.0.1, 8.0, 7.5.1, 7.5|
|Business Integration||IBM Business Process Manager Advanced||General||AIX, Linux, Solaris, Windows||8.0.1, 8.0, 7.5.1, 7.5|
|Business Integration||IBM Business Process Manager Express||General||Linux, Windows||8.0.1, 8.0, 7.5.1, 7.5|