PM95199: SOME REDUCE TASKS OF A PLATFORM SYMPHONY MAPREDUCE JOB COULD BE STUCK DUE TO CONNECTION TIMEOUT
Closed as program error.
When running big scale Platform Symphony MapReduce jobs, you can observe some reduce tasks hanging. The reason is connection timeout failures could break the communication between Scheduler and reducer tasks, and thus a reducer task just waits forever in fetching stage because it can not receive any notification of other map tasks finish.
Find the job with hanging reducer tasks, suspend the job and then resume the job using PSMR commands: soamcontrol session suspend MapReduce6.1:<jobid> soamcontrol session resume MapReduce6.1:<jobid> All hanging tasks would get retry.
See Error Description
This problem is fixed in Version 18.104.22.168 and later fix packs.
Reported component name
Reported component ID
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fixed component name
Fixed component ID
Applicable component levels