AWSJDB801E error received - MakePlan or UpdateStats jobs run a long time

Technote (troubleshooting)


Problem(Abstract)

The AWSJDB801E error is received during MakePlan or UpdateStats and the execution time of MakePlan or UpdateStats is very slow and takes a long time to complete.

Symptom

The Tivoli Workload Scheduler (TWS) MakePlan job takes a long time to complete

or

MakePlan or UpdateStats job(s) return the following error:

AWSJDB801E An internal error has been found while accessing the database. The internal
error message is: "The transaction log for the database is full.".

Cause

From the analysis of the preproduction plan, it is possible to identify that more completed job stream instances are present in the plan, and in particular that some "zombie" job stream instances have been generated for some failures of the forecast plans. These job stream instances are defined "zombie" because the job stream instances related to the forecast should not exist in the production plan.


The "zombie" job stream instances, due to forecast plans, block the cleanup procedure performed each day by JnextPlan (FINAL schedule).

For this reason, when FINAL runs, it adds the new job stream instances for the day and does not remove the old instances. This increases the size of the prepoduction plan table and causes a decreasing of the performace of the MakePlan and UpgradeStats jobs.

These "zombies" are present in the database when during concurrent multiple forecast operations or during concurrent changes of the objects and forecast plan, a deadlock is generated in the database. This issue is fixed with the APAR IZ69984: http://www-01.ibm.com/support/docview.wss?uid=swg1IZ69984

When JnextPlan runs, during the MakePlan step, the job streams are included in the Symphony file and added in the preproduction plan (this is the mdl.jsi_job_stream_instances table in the database).

When the job streams are completed in the Symphony, during the UpdateStats job the status is updated and they are marked as completed.

Each day JnextPlan calculates the start time of the first not complete preproduction plan job stream instance and, according to this value, it performs the cleanup procedure removing the completed job stream instances.

Instead, when the forecast plan runs, a similar MakePlan process creates a temporary preproduction plan that is removed when the forecast finishes.

1. Start time of first not complete preproduction plan job stream instance: 06/20/2009 13:00 TZ EDT
2. This job stream instance is the following instance in the database:

367171. 0AAAAAAAAAAALGSD 06/20/2009 2009-06-20-21.00.00.000000 N ' x'090074B79CC438619BC2EEDA2ABBE63E'

and this instance does not exist in the Symphony file

3. The ' x'090074B79CC438619BC2EEDA2ABBE63E' is the PLAN_ID. The not Null PLAN_ID means that it refers to a forecast plan.

The result is that the JnextPlan calculates a wrong start time of the first job stream instance and does not perform the cleanup procedure.


Diagnosing the problem

To determine if this Technote can be applied in your scenario, run the following:


planman showinfo

If the Start time of the "first not complete preproduction plan job stream instance" is very old, then you need to check for the presence of this instance inside the Symphony file.

Example: Start time of first not complete preproduction plan job stream instance: 06/20/2009 13:00 TZ EDT <----date is many months old

run the following:

db2 "select * from mdl.jsi_job_stream_instances order by JSI_SCHEDULED_DATE"

it returns with entries like the following:

367171. 0AAAAAAAAAAALGSD 06/20/2009 2009-06-20-21.00.00.000000 N ' -'

Check in the Symphony file for the presence of JS_ID as 0AAAAAAAAAAALGSD using conman show schedule.

Or use the Tivoli Dynamic Workload Console (TDWC) or the Job Scheduling Console (JSC) to perform the check in the Symphony. If this instance exists, determine the reason for this instance (is it still a valid schedule instance) and whether you can release or remove it with the conman release command or using conman delete.

The next JnextPlan will automatically perform the cleanup procedure.

If it does not exist, continue with this Technote.


Resolving the problem

To remove the "zombie" job stream instances, the TWS IBM support team proposes three different procedures:


1. Manually remove the zombie forecast plan. This unlocks the cleanup procedure for JnextPlan.

WARNING: No plan (JnexPlan or Final job stream), Forecast, or Trial plan generation can run during these operations.

IMPACT: The next JnextPlan execution could take more time to perform the cleanup operation of the completed job stream instances of the production plan. Moreover it could fail for a transaction log not correctly sized.

The query to delete the zombie job stream instances related to the forecast is the following:

db2 "delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is not null"

The procedure to increase the transaction log is the following:
http://publib.boulder.ibm.com/infocenter/tivihelp/v3r1/topic/com.ibm.tivoli.itws.doc_8.4.1/msgmst169.htm#increaselogs


2. Manually remove the "zombie" job stream instances and also manually perform the cleanup operations performed by JnextPlan.

WARNING: No plan (JnexPlan or Final job stream), Forecast, or Trial plan generation can run during these operations.

IMPACT: No impact regarding the next JnextPlan, but in this case it is suggested to let the DB administrator to perform the delete operations.

In case the DB administrator is not available, it is suggested to backup the database before performing this operation.


2a. Define the correct "Start time of first not complete preproduction plan job stream instance":

db2 "select (min(JSI_START_TIME) - 7 DAYS) from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_STATUS = 'N'"

For example, if the value is: 2010-02-05-13.00.00.00000 the next steps will allow the removal of all the job stream instances older than this value.

This could be done in a single sql statement, but the following steps describe how to perform the cleanup in batches, in order to have more control over the transactions

2b. Count the completed job stream instances:
db2 "select count(*) from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME < '2010-02-05-13.00.00.00000' and JSI_STATUS <> 'N'"

2c. Delete the instances already completed: this means NO IMPACT on the job stream instances already scheduled and in execution mode:

delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-06-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-07-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-08-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-09-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-10-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-11-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2009-12-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2010-01-01 13:00:00' and JSI_STATUS<>'N'
delete from MDL.JSI_JOB_STREAM_INSTANCES where PLAN_ID is null and JSI_START_TIME<'2010-02-05 13:00:00' and JSI_STATUS<>'N'


3. Use the utility implemented by the TWS L3 support team to automatically perform the deletions.
Before using the automatic cleanup procedure, it is strongly recommended to test it in a development environment.


cleanupInstances.zip

Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Tivoli Workload Scheduler

Software version:

8.3, 8.4, 8.5, 8.5.1

Operating system(s):

Platform Independent

Reference #:

1426128

Modified date:

2013-07-12

Translate my page

Machine Translation

Content navigation