To avoid skewing the message arrival rate, PFA ignores the first hour of message data after IPL and the last hour of message data before shutdown. In addition, PFA attempts to track the same persistent jobs that it tracked before IPL or PFA restart if the same persistent jobs are still active. (The same persistent jobs must still be active for PFA to track and there ten jobs must have previously tracked.)
This check is not designed to detect performance problems that are caused by insufficient resources, faulty WLM policy, or spikes in work. However, it might help to determine if a performance problem detected by a performance monitor or WLM is caused by a damaged system.
The message arrival rate check issues an exception using four types of comparisons, which are described in more detail in the next section:
If PFA detects an unexpectedly low number of messages, examine the report in SDSF for details about why the exception was issued. Use the Runtime Diagnostics output in the report to assist you in diagnosing and fixing the problem. For more information about Runtime Diagnostics see Runtime Diagnostics.
Parameter name | Default value | Minimum Value | Maximum Value | Description |
---|---|---|---|---|
collectint | 15 Minutes | 15 | 360 | This parameter determines how often (in minutes) to run the data collector that retrieves the current message arrival rate. |
modelint | 720 Minutes | 60 | 1440 | This parameter determines how often (in minutes) you want the system to analyze the data and construct a new message arrival rate model or prediction. By default, PFA analyzes the data and constructs a new model every “default value” minutes. The model interval must be at least four times larger than the collection interval. Note that, even when you set a value larger than 360, PFA performs the first model at 360 minutes (6 hours). By default, PFA analyzes the data and constructs a new model every 720 minutes (12 hours). |
stddev | 10 | 2 | 100 | This parameter is used to specify how much variance is allowed between the actual message arrival rate per amount of CPU and the expected message arrival rate. It is used when determining if the actual message arrival rate has increased beyond the allowable upper limit. It also determines how much variance is allowed across the time range predictions. If you set the STDDEV parameter to a smaller value, an exception is issued if the actual message arrival rate is closer to the expected message arrival rate and the predictions across the time ranges are consistent. If you set the STDDEV parameter to a larger value, an exception is issued if the actual message arrival rate is significantly greater than the expected message arrival rate even if the predictions across the different time ranges are inconsistent. |
collectinactive | 1 (on) | 0 (off) | 1 (on) | Defines whether data will be collected and modeled even if the check is not eligible to run, not ACTIVE(ENABLED), in IBM® Health Checker for z/OS. |
trackedmin | 3 | 0 | 1000 | This parameter defines the minimum message arrival rate required for a persistent job in order for it to be considered a top persistent job that should be tracked individually. |
exceptionmin | 1 | 0 | 1000 | This parameter is used when determining if an exception should be issued for an unexpectedly high message arrival rate. For tracked jobs and other persistent jobs, this parameter defines the minimum message arrival rate and the minimum predicted message arrival rate required to cause a too high exception. For non-persistent jobs and the total system comparisons, this parameter defines the minimum message arrival rate required to cause a too high exception. |
checklow | 1 (on) | 0 (off) | 1 | Defines whether Runtime Diagnostics is run to validate that the absence of messages is caused by a problem. If this value is off, exceptions are not issued for conditions in which the message arrival rate is unexpectedly low. |
stddevlow | 4 | 2 | 100 | This parameter is used to specify
how much variance is allowed between the actual message arrival rate
per amount of CPU and the expected message arrival rate when determining
if the actual rate is unexpectedly low.
|
limitlow | 3 | 1 | 100 | This parameter defines the maximum message arrival rate allowed when issuing an exception for an unexpectedly low number of messages. |
debug | 0 (off) | 0 (off) | 1 (on) | This parameter (an integer of 0 or 1) is used at the direction of IBM service to generate additional diagnostic information for the IBM Support Center. This debug parameter is used in place of the IBM Health Checker for z/OS policy. The default is off (0). |
AIR018I 02:22:54 PFA CHECK DETAIL
CHECK NAME: PFA_MESSAGE_ARRIVAL_RATE
ACTIVE : YES
TOTAL COLLECTION COUNT : 5
SUCCESSFUL COLLECTION COUNT : 5
LAST COLLECTION TIME : 02/05/2009 10:18:22
LAST SUCCESSFUL COLLECTION TIME: 02/05/2009 10:18:22
NEXT COLLECTION TIME : 02/05/2009 10:33:22
TOTAL MODEL COUNT : 1
SUCCESSFUL MODEL COUNT : 1
LAST MODEL TIME : 02/05/2009 10:18:24
LAST SUCCESSFUL MODEL TIME : 02/05/2009 10:18:24
NEXT MODEL TIME : 02/05/2009 22:18:24
CHECK SPECIFIC PARAMETERS:
COLLECTINT : 15
MODELINT : 720
COLLECTINACTIVE : 1=ON
DEBUG : 0=OFF
STDDEV : 10
TRACKEDMIN : 3
EXCEPTIONMIN : 1
CHECKLOW : 1=ON
STDDEVLOW : 4
LIMITLOW : 3
EXCLUDED JOBS:
NAME SYSTEM DATE ADDED REASON ADDED
JES * 2010/03/31 00:00 Exclude JES* jobs on ALL.
UPDATE CHECK(IBMPFA,PFA_MESSAGE_ARRIVAL_RATE)
ACTIVE
SEVERITY(MEDIUM)
INTERVAL(ONETIME)
PARMS=('COLLECTINT(15)','MODELINT(720)','STDDEV(10)','DEBUG(0)',
'COLLECTINACTIVE(1)','EXCEPTIONMIN(1)','TRACKEDMIN(3)'
'CHECKLOW(1)','STDDEVLOW(4)','LIMITLOW(3)')
DATE(20080330)
REASON('The message arrival rate is abnormal which
can indicate a system that is damaged.')
The message
arrival rate check is designed to run automatically after every data
collection. Do not change the INTERVAL parameter. Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Persistent address spaces with high rates:
Predicted Message
Message Arrival Rate
Job Arrival
Name ASID Rate 1 Hour 24 Hour 7 Day
________ ____ ____________ ____________ ____________ ____________
TRACKED1 001D 75.63 23.88 22.82 15.82
TRACKED2 0028 43.52 0.34 11.11 12.11
TRACKED3 0029 11.00 12.43 2.36 8.36
Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Persistent address spaces with low rates:
Predicted Message
Message Arrival Rate
Job Arrival
Name ASID Rate 1 Hour 24 Hour 7 Day
________ ____ ____________ ____________ ____________ ____________
JOBS4 001F 1.17 23.88 22.82 15.82
JOBS5 002D 2.01 8.34 11.11 12.11
Runtime Diagnostics Output:
Runtime Diagnostics detected a problem in job: JOBS4
EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: JOBS4
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: JOBS4 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Runtime Diagnostics detected a problem in job: JOBS5
EVENT 03: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: JOBS5
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 04: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: JOBS5 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Other persistent jobs group:
Prediction based on 1 hour of data : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data : 31.22
Persistent address spaces with high rates:
Message
Job Arrival
Name ASID Rate
________ ____ ____________
PERS1 001E 83.22
PERS2 0038 75.52
PERS3 0039 47.47
Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Other persistent jobs group:
Prediction based on 1 hour of data : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data : 31.22
Runtime Diagnostics Output:
EVENT 01: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: PERS4
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 02: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: PERS4 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Non-persistent jobs group:
Message arrival rate
in last collection interval : 65.49
Prediction based on 1 hour of data : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data : 31.22
Address spaces with high arrivals:
Message
Job Arrival
Name ASID Counts
________ ____ ____________
NONPERS1 001F 83
NONPERS2 0048 52
NONPERS3 0049 47
Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Message arrival rate
at last collection interval : 83.52
Prediction based on 1 hour of data : 98.27
Prediction based on 24 hours of data : 85.98
Prediction based on 7 days of data : 100.22
Top persistent users:
Predicted Message
Message Arrival Rate
Job Arrival
Name ASID Rate 1 Hour 24 Hour 7 Day
________ ____ ____________ ____________ ____________ ____________
JOB1 001D 58.00 23.88 22.82 15.82
JOB2 0028 11.00 0.34 11.11 12.11
JOB3 0029 11.00 12.43 2.36 8.36
...
Message Arrival Rate Prediction Report
Last successful model time : 01/27/2009 17:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Message arrival rate
in last collection interval : 2.02
Prediction based on 1 hour of data : 98.27
Prediction based on 24 hours of data : 85.98
Prediction based on 7 days of data : 100.22
Runtime Diagnostics Output:
EVENT 01: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:23:25
IXL013I IXLCONN REQUEST FOR STRUCTURE SYSZWLM_WORKUNIT FAILED.
JOBNAME: WLM ASID: 000A CONNECTOR NAME: #SY1
IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
CONADIAG0: 00000002
----------------------------------------------------------------------
EVENT 02: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:55
IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
RSN: START REQUEST FAILED
----------------------------------------------------------------------
EVENT 03: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:58
IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
RSN: START REQUEST FAILED
----------------------------------------------------------------------
EVENT 04: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:24:02
IXL013I IXLCONN REQUEST FOR STRUCTURE IXCT_SIGNAL FAILED.
JOBNAME: XCFAS ASID: 0006 CONNECTOR NAME: SIGPATH_01000008
IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
CONADIAG0: 00000002
----------------------------------------------------------------------
EVENT 05: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:24:02
IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
RSN: START REQUEST FAILED
----------------------------------------------------------------------
EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: DAVIDZ
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: DAVIDZ TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Name | Sym | Size |
---|---|---|
Kilo | K | 1,024 |
Mega | M | 1,048,576 |
Giga | G | 1,073,741,824 |
Tera | T | 1,099,511,627,776 |
Peta | P | 1,125,899,906,842 |
The following fields apply to all reports:
Guideline: If the use of the z/OS image is radically different after an IPL (for instance, the change from a test system to a production system), delete the files in the PFA_MESSAGE_ARRIVAL_RATE/data directory to enable the check to collect the most accurate modeling information.
Results files
Data store files:
Intermediate files:
This directory holds the following log files. Additional information is written to these log files when DEBUG(1).