To avoid skewing the SMF arrival rate, PFA ignores the first hour of SMF data after IPL and the last hour of SMF data prior to shutdown. In addition, PFA attempts to track the same persistent jobs that it tracked prior to IPL or PFA restart if the same persistent jobs are still active. (The same persistent jobs must still be active for PFA to track the same jobs and there must have been ten jobs tracked previously.)
This check is not designed to detect performance problems caused by insufficient resources, faulty WLM policy, or spikes in work. However, it might help to determine if a performance problem detected by a performance monitor or WLM is caused by a damaged system.
Guideline: If you modify the SMF record types in the SMFPRMxx parmlib member, delete the files in the PFA_SMF_ARRIVAL_RATE/data directory to ensure PFA is collecting relevant information.
If an unexpectedly low number of SMF records was detected, the best practice is to examine the report in SDSF for details about why the exception was issued. Use the Runtime Diagnostics output in the report to assist you in diagnosing and fixing the problem. For more information about Runtime Diagnostics, see Runtime Diagnostics.
Parameter name | Default value | Minimum Value | Maximum Value | Description |
---|---|---|---|---|
collectint | 15 Minutes | 15 | 360 | This parameter determines how often (in minutes) to run the data collector that retrieves the current SMF arrival rate. |
modelint | 720 Minutes | 60 | 1440 | This parameter determines how often (in minutes) you want the system to analyze the data and construct a new SMF arrival rate model or prediction. By default, PFA analyzes the data and constructs a new model every “default value” minutes. The model interval must be at least four times larger than the collection interval. Note that, even when you set a value larger than 360, PFA performs the first model at 360 minutes (6 hours). By default, PFA analyzes the data and constructs a new model every 720 minutes (12 hours). |
stddev | 3 | 2 | 100 | This parameter is used to specify how much variance is allowed between the actual SMF arrival rate per amount of CPU and the expected SMF arrival rate. It is used when determining if the actual SMF arrival rate has increased beyond the allowable upper limit. It also determines how much variance is allowed across the time range predictions. If you set the STDDEV parameter to a small value, an exception will be issued if the actual SMF arrival rate is closer to the expected SMF arrival rate and the predictions across the time ranges are consistent. If you set the STDDEV parameter to a larger value, an exception will be issued if the actual SMF arrival rate is significantly greater than the expected SMF arrival rate even if the predictions across the different time ranges are inconsistent. |
collectinactive | 1 (on) | 0 (off) | 1 (on) | Defines whether data will be collected and modeled even if the check is not eligible to run, not ACTIVE(ENABLED), in IBM® Health Checker for z/OS. |
trackedmin | 3 | 0 | 1000 | This parameter defines the minimum SMF arrival rate required for a persistent job in order for it to be considered a top persistent job that should be tracked individually. |
exceptionmin | 1 | 0 | 1000 | This parameter is used when determining if an exception should be issued for an unexpectedly high SMF arrival rate. For tracked jobs and other persistent jobs, this parameter defines the minimum SMF arrival rate and the minimum predicted SMF arrival rate required to cause a too high exception. For non-persistent jobs and the total system comparisons, this parameter defines the minimum SMF arrival rate required to cause a too high exception. |
checklow | 1 (on) | 0 (off) | 1 | Defines whether Runtime Diagnostics is run to validate that the absence of SMF records is caused by a problem. If this value is off then exceptions will not be issued for conditions in which the SMF arrival rate is unexpectedly low. |
stddevlow | 4 | 2 | 100 | This parameter is used to specify
how much variance is allowed between the actual SMF arrival rate per
amount of CPU and the expected SMF arrival rate when determining if
the actual rate is unexpectedly low.
|
limitlow | 3 | 1 | 100 | This parameter defines the maximum SMF arrival rate allowed when issuing an exception for an unexpectedly low number of SMF records. |
debug | 0 (off) | 0 (off) | 1 (on) | This parameter (an integer of 0 or 1) is used at the direction of IBM service to generate additional diagnostic information for the IBM Support Center. This debug parameter is used in place of the IBM Health Checker for z/OS policy. The default is off (0). |
AIR018I 02:22:54 PFA CHECK DETAIL
CHECK NAME: PFA_SMF_ARRIVAL_RATE
ACTIVE : YES
TOTAL COLLECTION COUNT : 5
SUCCESSFUL COLLECTION COUNT : 5
LAST COLLECTION TIME : 02/05/2009 10:18:22
LAST SUCCESSFUL COLLECTION TIME : 02/05/2009 10:18:22
NEXT COLLECTION TIME : 02/05/2009 10:33:22
TOTAL MODEL COUNT : 1
SUCCESSFUL MODEL COUNT : 1
LAST MODEL TIME : 02/05/2009 10:18:24
LAST SUCCESSFUL MODEL TIME : 02/05/2009 10:18:24
NEXT MODEL TIME : 02/05/2009 22:18:24
CHECK SPECIFIC PARAMETERS:
COLLECTINT : 15
MODELINT : 720
COLLECTINACTIVE : 1=YES
DEBUG : 0=NO
STDDEV : 3
TRACKEDMIN : 3
EXCEPTIONMIN : 1
CHECKLOW : 1=YES
STDDEVLOW : 4
LIMITLOW : 3
UPDATE CHECK(IBMPFA,PFA_SMF_ARRIVAL_RATE)
ACTIVE
SEVERITY(MEDIUM)
INTERVAL(ONETIME)
PARMS=('COLLECTINT(15)','MODELINT(720)','STDDEV(3)','DEBUG(0)',
'COLLECTINACTIVE(1)','EXCEPTIONMIN(1)','TRACKEDMIN(3)'
'CHECKLOW(1)','STDDEVLOW(4)','LIMITLOW(3)')
DATE(20080330)
REASON('The SMF arrival rate is abnormal which can indicate a
system that is damaged.')
The SMF arrival
rate check is designed to run automatically after every data collection.
Do not change the INTERVAL parameter. SMF Arrival Rate Prediction Report
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 720
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Persistent address spaces with high rates:
Predicted SMF
SMF Arrival Rate
Job Arrival
Name ASID Rate 1 Hour 24 Hour 7 Day
TRACKED1 001D 75.63 23.88 22.82 15.82
TRACKED2 0028 43.52 0.34 11.11 12.11
TRACKED3 0029 53.25 12.43 2.36 8.36
SMF Arrival Rate Prediction Report
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Persistent address spaces with low rates:
Predicted SMF
SMF Arrival Rate
Job Arrival
Name ASID Rate 1 Hour 24 Hour 7 Day
TRACKED4 005D 0.20 23.88 22.82 15.82
TRACKED5 0034 0.01 12.43 11.11 8.36
Runtime Diagnostics Output:
Runtime Diagnostics detected a problem in job: TRACKED4
EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: TRACKED4
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: TRACKED4 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Runtime Diagnostics detected a problem in job: TRACKED5
EVENT 08: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: TRACKED5
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 09: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: TRACKED5 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
SMF Arrival Rate Prediction Report
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Other persistent jobs group:
Prediction based on 1 hour of data : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data : 31.22
Persistent address spaces with high rates:
SMF
Job Arrival
Name ASID Rate
________ ____ ____________
PERS1 001E 83.22
PERS2 0038 75.52
PERS3 0039 47.47
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Other persistent jobs group:
Prediction based on 1 hour of data : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data : 31.22
Runtime Diagnostics Output:
EVENT 01: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: PERS4
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 02: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: PERS4 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 03: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: PERS5
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 04: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: PERS5 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
SMF Arrival Rate Prediction Report
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
Non-persistent jobs group:
SMF arrival rate
in last collection interval : 65.49
Prediction based on 1 hour of data : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data : 31.22
Address spaces with high arrivals:
SMF
Job Arrival
Name ASID Counts
________ ____ ____________
NONPERS1 001F 83
NONPERS2 0048 52
NONPERS3 0049 47
SMF Arrival Rate Prediction Report
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
SMF arrival rate
at last collection interval : 83.52
Prediction based on 1 hour of data : 98.27
Prediction based on 24 hours of data : 85.98
Prediction based on 7 days of data : 100.22
Top persistent users:
Predicted SMF
SMF Arrival Rate
Job Arrival
Name ASID Rate 1 Hour 24 Hour 7 Day
________ ____ ____________ ____________ ____________ ____________
TRACKED1 001D 58.00 23.88 22.82 15.82
TRACKED2 0028 11.00 0.34 11.11 12.11
TRACKED3 0029 11.00 12.43 2.36 8.36
.
.
.
SMF Arrival Rate Prediction Report
Last successful model time : 01/27/2009 11:08:01
Next model time : 01/27/2009 23:08:01
Model interval : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time : 01/27/2009 17:56:38
Collection interval : 15
SMF arrival rate
in last collection interval : 2.05
Prediction based on 1 hour of data : 98.27
Prediction based on 24 hours of data : 85.98
Prediction based on 7 days of data : 100.22
Runtime Diagnostics Output:
EVENT 01: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:23:25
IXL013I IXLCONN REQUEST FOR STRUCTURE SYSZWLM_WORKUNIT FAILED.
JOBNAME: WLM ASID: 000A CONNECTOR NAME: #SY1
IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
CONADIAG0: 00000002
----------------------------------------------------------------------
EVENT 02: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:55
IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
RSN: START REQUEST FAILED
----------------------------------------------------------------------
EVENT 03: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:58
IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
RSN: START REQUEST FAILED
----------------------------------------------------------------------
EVENT 04: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:24:02
IXL013I IXLCONN REQUEST FOR STRUCTURE IXCT_SIGNAL FAILED.
JOBNAME: XCFAS ASID: 0006 CONNECTOR NAME: SIGPATH_01000008
IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
CONADIAG0: 00000002
----------------------------------------------------------------------
EVENT 05: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:24:02
IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
RSN: START REQUEST FAILED
----------------------------------------------------------------------
EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID CPU RATE: 96% ASID: 0027 JOBNAME: DAVIDZ
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
ASID: 0027 JOBNAME: DAVIDZ TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
JOBSTART: 2009/06/12 - 13:28:35
Error:
ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action:
USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Name | Sym | Size |
---|---|---|
Kilo | K | 1,024 |
Mega | M | 1,048,576 |
Giga | G | 1,073,741,824 |
Tera | T | 1,099,511,627,776 |
Peta | P | 1,125,899,906,842 |
Guideline: If the use of the z/OS image is radically different after an IPL (for instance, the change from a test system to a production system) of if you modify the SMF record types in SMFPRMxx, delete the files in the PFA_SMF_ARRIVAL_RATE/data directory to ensure the check can collect the most accurate modeling information.
Results files
Data store files:
Intermediate files:
This directory holds the following log files. Additional information is written to these log files when DEBUG(1).