Restarting batch jobs
Stopping and restarting batch jobs are essential functions when processing large volumes of data.
The batch processor can restart the following types of stopped batch jobs:
- Batch jobs that were stopped by the
runbatch.sh -stop <processId>
command. - Batch jobs that stopped gracefully due to manual intervention (updateTask). The status of these batch jobs is Stopped.
- Batch jobs that stopped gracefully due to a system error. The status of these batch jobs is Stopped.
- Batch jobs that stopped unexpectedly due to system failure (crash).
The status of these batch jobs depends on the state the task was in
when the failure occurred:
- If the job was in staging, the status is Pending.
- If the job was being processed, the status is In Progress.
- If the job was in the process of being stopped but had not yet stopped, the status is Stopping.
To restart a stopped task, update the task status to Pending by either:
- Sending an updateTask XML transaction request to InfoSphere® MDM with
a
start task
action code. - Running the command
runbatch.sh -start <processId>
The batch processor will then pick up the pending request and try to restart it from where it stopped.
The batch processor is able to restart a batch job at the correct place by using three detailed status files that are kept for each task: a Stage file, a Result file, and a Restart file.
- Stage file
- When the batch processor starts a new batch job, it creates a
Stage file. The name of the Stage file is based on the process ID,
task name, and task ID. For example, if the process ID is 15787858,
the task name is Persist Entities, and the task ID is 680132805003874901,
then the Stage file name is 15787858_Persist Entities_680132805003874901_stage.
The first line of the Stage file is a title. Each line after the title stores a message ID and record. The message ID is a unique, sequential number generated by the batch processor at runtime to identify each record involved in the job. The types of information for each record in the Stage file depend on what is defined in the METADATA_KEY of the CDMETADATAINFOTP code table for the task.
For example:
MessageID,ENTITY_ID,ENTITY_TYPE 1,100000000000000001,mdmper 2,100000000000000002,mdmper 3,100000000000000003,mdmper 4,100000000000000004,mdmper
Alternate example:
MessageID,NO_TITLE_LINEENTITY_ID 1,<?xml version="1.0" encoding="UTF-8"?><TCRMService ... 2,<?xml version="1.0" encoding="UTF-8"?><TCRMService ... 3,<?xml version="1.0" encoding="UTF-8"?><TCRMService ... 4,<?xml version="1.0" encoding="UTF-8"?><TCRMService ...
- Result file
- The batch processor records the results of each job in a Result
file. Similar to the Stage file, the Result file name is based on
the process ID, task name, and task ID, such as 15787858_Persist
Entities_680132805003874901_result
The Result file stores the unique message ID of each record in the batch job along with a result category to represent the outcome of the processing for that record:
F
represents a failed outcome.S
represents a successful outcome.
Each line in the Result file represents a different record. For example:
1,S 2,F 3,S 4,S
The batch processor determines whether to mark a processing outcome as a success or failure depending on the result categorizer class, as defined in the Batch.properties file.
resultCategorizer=com.ibm.mdm.batchframework.message.BatchMessageCategorizer
The BatchMessageCategorizer determines the message outcome based on whether the transaction results in a DWLResponseException message. If so, the outcome is a failure (
F
); otherwise, the outcome is a success (S
).Tip: If the default BatchMessageCategorizer categorizer class’s behavior is not appropriate for your implementation, then you can use the ResultCodeMessageCategorizer categorizer class instead. Change theresultCategorizer
property as follows:resultCategorizer=com.ibm.mdm.batchframework.bulkprocessing.restart.ResultCodeMessageCategorizer
The ResultCodeMessageCategorizer determines the message outcome based on the value of the <ResultCode> tag from its response output. If the value is SUCCESS, then the outcome is
S
; otherwise, the outcome isF
. - Restart file
- When the batch processor restarts a batch job, it creates a Restart
file by comparing the Stage file to the Result file and determining
the list of remaining, unprocessed records. Similar to the Stage and
Result files, the Restart file name is based on the process ID, task
name, and task ID, such as 15787858_Persist Entities_680132805003874901_restart.
The Restart file has the same format as the Stage file. The Restart file contains a subset of the Stage file, and is made up of the entities in the Stage file, minus a subset of the entities in the Result file.
The batch processor uses the Restart file as an input file to process the remaining entities in the restarted batch job.