A fix is available
APAR status
Closed as program error.
Error description
L3 belives that Pump got S0C4 because of too long data as input for one of our modules. Possibly incorrect data came from one of Pump TRAP. This APAR to eliminate possibility of S0C4, i.e add ARGS length check and issue error message with full diagnostic data instead of processing them. With such fix customer should never get S0C4 again and if problem reoccurs based on diagnostic data from new message we will find NAME and VALUE of wrong TRAP and fix it with separate apar if necessary. Contact Mikhail Bavbel, Event Pump L3 for more information Receiving the following errors in STC GTMPUMP: 04.00.01 STC34992 GTM2155I LOG PROCESSING HAS BEEN TERMINATED 04.00.01 STC34992 GTM2101I LOG PROCESSING IS AVAILABLE. DDNAME = ACC1LOGA 04.00.01 STC34992 GTM2102I DSNAME = SYSOUT(X) 04.17.04 STC34992 IEA995I SYMPTOM DUMP OUTPUT 817 817 SYSTEM COMPLETION CODE=0C4 REASON CODE=00000011 817 TIME=04.17.04 SEQ=10868 CPU=0000 ASID=003E 817 PSW AT TIME OF ERROR 078D0000 9FECEB98 ILC 2 INTC 11 817 ACTIVE LOAD MODULE ADDRESS=1FECD650 OFFSET=0000 817 NAME=GTMCTMON 817 DATA AT PSW 1FECEB92 - 50B018F1 18E30E0E 1817D203 817 GR 0: 1FF4724D 1: 00001DAD 817 2: 1F644000 3: 1FEC802B 817 4: 00005D82 5: 1FF430D8 817 6: 1FEC8018 7: 9F99F960 817 8: 00005F28 9: 1FEC9E20 817 A: 0000007C B: 1FECE85E 817 C: 1F71916C D: 1FEC9E20 817 E: 1FECC000 F: 00001DAD 817 END OF SYMPTOM DUMP 100 END OF SYMPTOM DUMP 2.00.02 STC34992 GTM2155I LOG PROCESSING HAS BEEN TERMINATED 2.00.02 STC34992 GTM2101I LOG PROCESSING IS AVAILABLE. DDNAME = ACC1LOGA 2.00.02 STC34992 GTM2102I DSNAME = SYSOUT(X) 5.04.22 STC34992 GTM5507E ENTRY PUT TO WLEDGE. COMMAND QUEUE FAILED 6.00.02 STC34992 GTM2155I LOG PROCESSING HAS BEEN TERMINATED 6.00.02 STC34992 GTM2101I LOG PROCESSING IS AVAILABLE. DDNAME = ACC1LOGA 6.00.02 STC34992 GTM2102I DSNAME = SYSOUT(X) 0.00.02 STC34992 GTM2155I LOG PROCESSING HAS BEEN TERMINATED 0.00.02 STC34992 GTM2101I LOG PROCESSING IS AVAILABLE. DDNAME = ACC1LOGA These abends started on 02 May and happpened repeatedly, untl 08 May when it stopped sending events to OminBus. ZPUMP is running on all 3 LPARs in this plex and the abends are only happening on one LPAR (CPUB). We are also receiving this message in all of the GTMPUMP STC's GPZ0320W UNABLE TO LOCATE INTERVAL SETTINGS FOR MAINVIEW SYSP GPZ0451I GPZZMVIN COMPLETED WITH RETURN CODE 4 GTM5507E ENTRY PUT TO REPLY COMMAND QUEUE FAILED
Local fix
Recycle/restart the Event Pump should result in correcting the event loss.
Problem summary
**************************************************************** * USERS AFFECTED: All Tivoli Event Pump users. * **************************************************************** * PROBLEM DESCRIPTION: Tivoli Event Pump for z/OS can genarate * * GTM5507E and GTM5508E error messages in * * system log. ABENDs S878 and/or S0C4 can * * occur for set of modules in Tivoli * * Event Pump address space. In some cases * * Tivoli Event Pump may hang and not send * * events to the distributed side. * **************************************************************** * RECOMMENDATION: Apply the PTF. * **************************************************************** Source Collector of Tivoli Event Pump for z/OS (or Master address space, if runs in two address space mode) can generate GTM5507E and GTM5508E messages in SYSLOG. ... GTM5507E ENTRY PUT TO TRASH COMMAND QUEUE FAILED ... GTM5508E NO CORRESPONDING QUEUE FOR MASK TRASH* FOUND ... The frequency of these messages is not determined and depends on the customization and load of running system. Setting Source Collector (Master address space) WTOR_ALERTS=YES initial parameter and monitoring jobs that generate WTOR messages significantly increases the frequency of GTM5507E and GTM5508E messages in the SYSLOG. GTMCTMON, GTMCTSRP, GTMIQSRG, GTMIQSRP, GTMAOPD3 modules may fail with S878 and S0Cx ABENDs: IEA705I ERROR DURING GETMAIN SYS CODE = 878-14 GTMPUMP GTMPUMP IEA995I SYMPTOM DUMP OUTPUT 836 SYSTEM COMPLETION CODE=878 REASON CODE=00000014 TIME=05.00.30 SEQ=60038 CPU=0000 ASID=018B IEA705I 00F5DB80 007C02E8 007C02E8 007C7000 FFFFD428 PSW AT TIME OF ERROR 070C1000 81592F7C ILC 2 INTC 0D NO ACTIVE MODULE FOUND NAME=UNKNOWN DATA AT PSW 01592F76 - 00181610 0A0D98DC D0088910 GR 0: 84000000 1: 84878000 2: 00000004 3: 00007C00 4: 00000000 5: 00000005 6: 00000000 7: 00000878 8: 00000014 9: FFFFD428 A: 1FECE4A0 B: 00000878 C: 81592F44 D: 7FF15008 E: 9FED3B04 F: 00000014 END OF SYMPTOM DUMP SYSTEM COMPLETION CODE=0C4 REASON CODE=00000011 TIME=04.17.04 SEQ=10868 CPU=0000 ASID=003E PSW AT TIME OF ERROR 078D0000 9FECEB98 ILC 2 INTC 11 ACTIVE LOAD MODULE ADDRESS=1FECD650 OFFSET=00001548 NAME=GTMCTMON DATA AT PSW 1FECEB92 - 50B018F1 18E30E0E 1817D203 GR 0: 1FF4724D 1: 00001DAD 2: 1F644000 3: 1FEC802B 4: 00005D82 5: 1FF430D8 6: 1FEC8018 7: 9F99F960 8: 00005F28 9: 1FEC9E20 A: 0000007C B: 1FECE85E C: 1F71916C D: 1FEC9E20 E: 1FECC000 F: 00001DAD END OF SYMPTOM DUMP Depending on the product customization these ABENDs may be the cause of stopping processing of registered traps. The event flow from the Tivoli Event Pump to the distributed side may break. In some cases whole Source Collector (or Master) address space may hand and system operator can be forced to CANCEL the whole address space.
Problem conclusion
All Tivoli Event Pump traps are processed by the COMMAND TASK engine which queues all requests and executes REXX execs or call/attach load modules consistently one by one. The buffer length for arguments of requested exec/module was coded as 128 byte. Several traps, including WTOR traps, run execs/modules with arguments that have more then 128 bytes in length. The buffer overflow had occurred and COMMAND TASK engine got invalid names of queues where incoming requests should be executed. Also arguments buffer overflow was the cause of S878 and S0C4 ABENDs in COMMAND TASK engine modules. The length of REXX exec / load module arguments buffer was increased to 256 bytes. Check for buffer overflow was added into the code. Since the trap ACTION field is restricted to the 256 bytes this issue will never happen again in the future.
Temporary fix
**************************************************************** * HIPER * **************************************************************** TEMPORARILY SUSPEND MONITORING OF JOBS THAT GENERATE WTOR MESSAGES, SET WTOR_ALERTS INITIAL PARAMETER TO 'NO' VALUE OR REMOVE IT FROM PARAMETERS DATA SET. RECYCLE/RESTART THE EVENT PUMP SHOULD RESULT IN CORRECTING THE EVENT LOSS.
Comments
APAR Information
APAR number
OA36516
Reported component name
EVENT PUMP FOR
Reported component ID
5698B3400
Reported release
422
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt
Submitted date
2011-05-19
Closed date
2011-06-02
Last modified date
2011-07-05
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UA60672 UA60673 UA60674
Modules/Macros
GTMAOPD3 GTMCTMON GTMCTPUT GTMCTSRP GTMRXCTP
Fix information
Fixed component name
EVENT PUMP FOR
Fixed component ID
5698B3400
Applicable component levels
R420 PSY UA60672
UP11/06/14 P F106
R421 PSY UA60673
UP11/06/14 P F106
R422 PSY UA60674
UP11/06/14 P F106
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSXTW7","label":"Tivoli Event Pump for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"422","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"422","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
05 July 2011