IBM Support

Recovery procedure if 'conman start' is issued while 'stageman' is running in SWITCHPLAN job.

Troubleshooting


Problem

Normally the stageman command issued by SWITCHPLAN runs so quickly it would not be possible to issue a 'conman start' while stageman is underway. However; if 'conman start' is issued while 'stageman' is running the resultant Sinfonia and Symphony files will be corrupt.

Symptom

* FINAL.SWITCHPLAN in the archived plan shows SUCC rather than EXEC

* TWS/audit/plan/<DATE> file has a record for 'conman start' after 'conman stop' is issued and before stageman completes

* TWS/stdlist/traces/<DATE>_TWSMERGE.log may have an entry like this:
BATCHMAN:AWSBHT057W Batchman has found a non-valid run number in the Symphony file for the following record type: "Cs" and object: "".

Cause

The command 'conman start' manually issued while SWITCHPLAN 'stageman' command is executing.

Diagnosing The Problem

* Inspect the plan audit file: TWS/audit/plan/<DATE>. Look for 'conman stop', 'conman start', then 'stageman' entries in that order. If the timestamp for the 'conman start' entry is earlier than the 'stageman' audit record and is later than the date/time stamp on stageman's archive file, then that is positive evidence that 'conman start' was issued after stageman started and before it completed.

* Inspect the SWITCHPLAN joblog for calls to 'conman start'. There should be only one call to 'conman start' and it should be at the end of the joblog. If conman was already started, the following message will be found:
"...AWSBHT057W

AWSBHU014I <MASTER_CPU> already active.


%start
..."

* Commands like 'conman stop' will not return.

* The size of the Symphony stays the same size as the Sinfonia file.

The Symphony and Sinfonia are not good in this situation. Symphony never updates because batchman refuses to do anything.

Resolving The Problem

This procedure will allow the normal SWITCHPLAN job processing to take place by restoring the Symphony file that was archived in TWS/schedlog/M<DATE/TIME> during the problematic SWITCHPLAN job.

1. Kill all TWS Engine processes:


netman
mailman
batchman
jobman
writer
monman

2. Rename the following files:
Symphony
Sinfonia
Jobtable
*.msg

3. Copy the latest archived plan from TWS/schedlog/M<DATE/TIME> to TWS/Symphony
*After the copy is performed, make sure that the Symphony file owner and group match the TWSUser's name and group.

4. Issue: conman start

5. Confirm that TWS Engine commands are running.

6. Issue: conman "rerun final.switchplan"
*Note: This assumes that the FINAL.SWITCHPLAN job in the archived plan has status SUCC
**Note: There is no need to rerun the MAKEPLAN job because the Symnew file is valid.

[{"Product":{"code":"SSGSPN","label":"IBM Workload Scheduler"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.6;9.1;9.2;9.3;9.4","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 June 2018

UID

swg22007697