PFA_SMF_ARRIVAL_RATE

Description:
Start of changeWhen SMF is active on the system, this check determines if there is an abnormal SMF arrival rate per CPU millisecond. PFA examines only the SMF record types you set the SMFPRMxx parmlib member to generate. When the number of SMF SMF records written per CPU millisecond is unusually high or low, PFA can provide an early indication of a problem and potentially prevent damage to an LPAR.End of change

To avoid skewing the SMF arrival rate, PFA ignores the first hour of SMF data after IPL and the last hour of SMF data prior to shutdown. In addition, PFA attempts to track the same persistent jobs that it tracked prior to IPL or PFA restart if the same persistent jobs are still active. (The same persistent jobs must still be active for PFA to track the same jobs and there must have been ten jobs tracked previously.)

This check is not designed to detect performance problems caused by insufficient resources, faulty WLM policy, or spikes in work. However, it might help to determine if a performance problem detected by a performance monitor or WLM is caused by a damaged system.

Guideline: If you modify the SMF record types in the SMFPRMxx parmlib member, delete the files in the PFA_SMF_ARRIVAL_RATE/data directory to ensure PFA is collecting relevant information.

The SMF arrival rate check issues an exception using the following four types of comparisons. After the check issues an exception, it does not perform the next comparison type. All jobs are included in this group except those that match a job specified in the /config/EXCLUDED_JOBS file for this check.
Note: If SMF is restarted, the data previously collected will be automatically discarded and the check will enter the phase where it collects data for a period of time to use in determining the jobs to track. This processing is done to reduce false positives and to ensure that data that was potentially collected using different SMF record types is not used by PFA after SMF is restarted.
  1. Top persistent jobs: The SMF arrival rate check tracks the top persistent jobs individually. Jobs are considered persistent when they start within an hour after IPL. PFA determines which jobs to track individually based on the following:
    • If PFA previously ran on this system and the same 10 jobs that were previously tracked are active, PFA tracks the same jobs.
    • If PFA never ran on the system or the same jobs previously tracked are not all active, PFA collects data for a period of time to use in determining the jobs with the highest arrival rates. After this time, the jobs with the highest arrival rates for that period are individually tracked.
    PFA performs this type of comparison to determine if the SMF arrival rate is higher than expected or lower than expected.
  2. Other persistent jobs: The persistent jobs that are not individually track are considered "other persistent jobs". The SMF arrival rate check models predictions using the totals for this group. When PFA determines the SMF arrival rate is higher than expected, the comparisons are performed individually using a mathematical formula. When PFA determines the SMF rate is lower than expected, the comparisons are performed using the totals for the group.
  3. Non-persistent jobs: The jobs that start over an hour after IPL are the non-persistent jobs. The predictions and comparisons are done using the totals for this group. PFA only performs this type of comparison to determine if the SMF arrival rate is higher than expected.
  4. Total system: The predictions and the comparisons are done using the totals for the entire system. PFA performs this type of comparison to determine if the SMF arrival rate is higher than expected or lower than expected.
Reason for check:
Start of changeThe objective of this check is to determine if there is potential of damage to an LPAR by checking the arrival rate of SMF records per number of CPU milliseconds on a system.End of change
Best practice:
If an unexpectedly high number of SMF records was detected, the best practice is to review the SMF records being sent by the address spaces identified on the report and examine the system log to determine what is causing the increase in SMF activity.

If an unexpectedly low number of SMF records was detected, the best practice is to examine the report in SDSF for details about why the exception was issued. Use the Runtime Diagnostics output in the report to assist you in diagnosing and fixing the problem. For more information about Runtime Diagnostics, see Runtime Diagnostics.

z/OS® releases the check applies to:
z/OS V1R12 and later.
Type of check:
Remote
Parameters accepted:
Yes, as follows:
Table 1. PFA_SMF_ARRIVAL_RATE check parameters
Parameter name Default value Minimum Value Maximum Value Description
collectint 15 Minutes 15 360 This parameter determines how often (in minutes) to run the data collector that retrieves the current SMF arrival rate.
modelint 720 Minutes 60 1440 This parameter determines how often (in minutes) you want the system to analyze the data and construct a new SMF arrival rate model or prediction. By default, PFA analyzes the data and constructs a new model every “default value” minutes. The model interval must be at least four times larger than the collection interval. Note that, even when you set a value larger than 360, PFA performs the first model at 360 minutes (6 hours). By default, PFA analyzes the data and constructs a new model every 720 minutes (12 hours).
stddev 3 2 100 This parameter is used to specify how much variance is allowed between the actual SMF arrival rate per amount of CPU and the expected SMF arrival rate. It is used when determining if the actual SMF arrival rate has increased beyond the allowable upper limit. It also determines how much variance is allowed across the time range predictions. If you set the STDDEV parameter to a small value, an exception will be issued if the actual SMF arrival rate is closer to the expected SMF arrival rate and the predictions across the time ranges are consistent. If you set the STDDEV parameter to a larger value, an exception will be issued if the actual SMF arrival rate is significantly greater than the expected SMF arrival rate even if the predictions across the different time ranges are inconsistent.
collectinactive 1 (on) 0 (off) 1 (on) Defines whether data will be collected and modeled even if the check is not eligible to run, not ACTIVE(ENABLED), in IBM® Health Checker for z/OS.
trackedmin 3 0 1000 This parameter defines the minimum SMF arrival rate required for a persistent job in order for it to be considered a top persistent job that should be tracked individually.
exceptionmin 1 0 1000 This parameter is used when determining if an exception should be issued for an unexpectedly high SMF arrival rate. For tracked jobs and other persistent jobs, this parameter defines the minimum SMF arrival rate and the minimum predicted SMF arrival rate required to cause a too high exception. For non-persistent jobs and the total system comparisons, this parameter defines the minimum SMF arrival rate required to cause a too high exception.
checklow 1 (on) 0 (off) 1 Defines whether Runtime Diagnostics is run to validate that the absence of SMF records is caused by a problem. If this value is off then exceptions will not be issued for conditions in which the SMF arrival rate is unexpectedly low.
stddevlow 4 2 100 This parameter is used to specify how much variance is allowed between the actual SMF arrival rate per amount of CPU and the expected SMF arrival rate when determining if the actual rate is unexpectedly low.
  • If you set the STDDEVLOW parameter to a smaller value, an exception is issued when the actual SMF arrival rate is closer to the expected SMF arrival rate.
  • If you set the STDDEVLOW parameter to a larger value, an exception is issued when the actual SMF arrival rate is significantly lower than the expected SMF arrival rate.
limitlow 3 1 100 This parameter defines the maximum SMF arrival rate allowed when issuing an exception for an unexpectedly low number of SMF records.
debug 0 (off) 0 (off) 1 (on) This parameter (an integer of 0 or 1) is used at the direction of IBM service to generate additional diagnostic information for the IBM Support Center. This debug parameter is used in place of the IBM Health Checker for z/OS policy. The default is off (0).
To determine the status of the SMF arrival rate check, issue f pfa,display,check(pfa_SMF_arrival_rate),detail. For the command example and more details, see . The following example shows the output written to message AIR018I in SDSF:
AIR018I 02:22:54 PFA CHECK DETAIL

CHECK NAME:  PFA_SMF_ARRIVAL_RATE
    ACTIVE                          : YES
    TOTAL COLLECTION COUNT          : 5
    SUCCESSFUL COLLECTION COUNT     : 5
    LAST COLLECTION TIME            : 02/05/2009 10:18:22
    LAST SUCCESSFUL COLLECTION TIME : 02/05/2009 10:18:22
    NEXT COLLECTION TIME            : 02/05/2009 10:33:22
    TOTAL MODEL COUNT               : 1
    SUCCESSFUL MODEL COUNT          : 1 
    LAST MODEL TIME                 : 02/05/2009 10:18:24
    LAST SUCCESSFUL MODEL TIME      : 02/05/2009 10:18:24
    NEXT MODEL TIME                 : 02/05/2009 22:18:24
    CHECK SPECIFIC PARAMETERS:
       COLLECTINT                   : 15
       MODELINT                     : 720
       COLLECTINACTIVE              : 1=YES
       DEBUG                        : 0=NO
       STDDEV                       : 3
       TRACKEDMIN                   : 3
       EXCEPTIONMIN                 : 1
       CHECKLOW                     : 1=YES
       STDDEVLOW                    : 4
       LIMITLOW                     : 3
              										                       										                
User override of IBM values:
The following shows keywords you can use to override check values on either a POLICY statement in the HZSPRMxx parmlib member or on a MODIFY command. This statement can be copied and modified to override the check defaults:
UPDATE CHECK(IBMPFA,PFA_SMF_ARRIVAL_RATE)
            ACTIVE
            SEVERITY(MEDIUM)
            INTERVAL(ONETIME)
      PARMS=('COLLECTINT(15)','MODELINT(720)','STDDEV(3)','DEBUG(0)',
            'COLLECTINACTIVE(1)','EXCEPTIONMIN(1)','TRACKEDMIN(3)'
            'CHECKLOW(1)','STDDEVLOW(4)','LIMITLOW(3)')
            DATE(20080330)
      REASON('The SMF arrival rate is abnormal which can indicate a 
             system that is damaged.')
The SMF arrival rate check is designed to run automatically after every data collection. Do not change the INTERVAL parameter.
Verbose support:
The check provides additional detail in verbose mode. You can put a check into verbose mode using the UPDATE,filters,VERBOSE=ON parameters on either the MODIFY command or in a POLICY statement in an HZSPRMxx parmlib member.
Debug support:
The DEBUG parameter in IBM Health Checker for z/OS is ignored by this check. Rather, the debug parameter is a PFA check specific parameter. For details, see Understanding how to modify PFA checks.
Reference:
For more information about PFA, see the topic on Overview of Predictive Failure Analysis.
Messages:
The output is a SMF arrival rate prediction report that corresponds to the message issued. PFA generates one of the following reports:
  • AIRH187E and AIRH208E - tracked jobs exception report
  • AIRH188E and AIRH209E - other persistent jobs exception report
  • AIRH191E - other non-persistent jobs exception report
  • AIRH174E and AIRH175E - total system exception report
For additional message information, see the topics on:
SECLABEL recommended for MLS users:
SYSLOW
Output:
The output is a variation of the SMF arrival rate prediction report. The values found in the SMF arrival prediction file are as follows:
Tracked jobs exception report: PFA issues the SMF arrival rate exception report for tracked jobs when any one or more tracked, persistent jobs cause an exception. These exceptions can be caused by a higher than expected SMF arrival rate or a lower than expected SMF arrival rate. Only the tracked jobs that caused the exception are included in the list of jobs on the report. If the report was generated due to a lower than expected SMF arrival rate, the report includes Runtime Diagnostics output to can help you diagnose the behavior. The following example is the SMF arrival rate tracked jobs exception report for jobs that had a higher than expected SMF arrival rate (AIRH187E):
Figure 1. SMF arrival rate: tracked jobs higher than expected
 SMF Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 11:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  720                   
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    
                                                        
Persistent address spaces with high rates:                                 
                                                                           
                                            Predicted SMF                  
                          SMF                Arrival Rate                  
  Job                 Arrival                                              
  Name     ASID          Rate        1 Hour       24 Hour         7 Day    
  TRACKED1 001D         75.63         23.88         22.82         15.82    
  TRACKED2 0028         43.52          0.34         11.11         12.11    
  TRACKED3 0029         53.25         12.43          2.36          8.36    
  
The following example is the SMF arrival rate tracked jobs exception report for jobs that had a lower than expected SMF arrival rate (for AIH208E):
Figure 2. SMF arrival rate: tracked jobs lower than expected
SMF Arrival Rate Prediction Report

Last successful model time      : 01/27/2009 11:08:01 
Next model time                 : 01/27/2009 23:08:01
Model interval                  : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time            : 01/27/2009 17:56:38
Collection interval             : 15

Persistent address spaces with low rates:
                                       Predicted SMF 
                        SMF              Arrival Rate             
Job                     Arrival                                       
Name       ASID         Rate        1 Hour       24 Hour      7 Day    
TRACKED4   005D         0.20        23.88        22.82        15.82
TRACKED5   0034         0.01        12.43        11.11         8.36

Runtime Diagnostics Output:

Runtime Diagnostics detected a problem in job: TRACKED4
  EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: TRACKED4
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: TRACKED4 TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Runtime Diagnostics detected a problem in job: TRACKED5
  EVENT 08: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: TRACKED5
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 09: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: TRACKED5 TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
Other persistent jobs exception report: PFA issues the SMF arrival rate other persistent jobs exception report when a comparison of a persistent job (that is not being individually tracked) causes an exception when compared to the totals of the other persistent jobs. The exception can be caused by a higher than expected SMF arrival rate or a lower than expected SMF arrival rate. The predictions listed on this report are the predicted rates for the total other persistent jobs group. The list of jobs is only those persistent jobs (not tracked individually) that have a problem and is only generated for a higher than expected SMF arrival rate. No predictions are given for these jobs because PFA does not model individual predictions for jobs that are not tracked individually. If there is more than one job with the same name, four asterisks **** are printed for the ASID in the report. If the report was generated due to a lower than expected SMF arrival rate, the report will include Runtime Diagnostics output which can help diagnose the behavior. The following example is the SMF arrival rate other persistent jobs exception report for jobs that had an unexpectedly high SMF arrival rate (AIRH188E):
Figure 3. SMF arrival rate: other persistent jobs higher than expected
SMF Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 11:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                   
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    

Other persistent jobs group:
Prediction based on 1 hour of data   : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data   : 31.22

Persistent address spaces with high rates:

                          SMF    
  Job                 Arrival    
  Name     ASID          Rate    
  ________ ____  ____________    
  PERS1    001E         83.22    
  PERS2    0038         75.52    
  PERS3    0039         47.47    
  
This example is the SMF arrival rate other persistent jobs exception report for jobs that had a lower than expected SMF arrival rate (for AIRH209E):
Figure 4. SMF arrival rate: other persistent jobs lower than expected
Last successful model time      : 01/27/2009 11:08:01
Next model time                 : 01/27/2009 23:08:01
Model interval                  : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time            : 01/27/2009 17:56:38
Collection interval             : 15

Other persistent jobs group:
 Prediction based on 1 hour of data   : 20.27
 Prediction based on 24 hours of data : 27.98
 Prediction based on 7 days of data   : 31.22

Runtime Diagnostics Output:

  EVENT 01: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: PERS4
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 02: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: PERS4 TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 03: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: PERS5
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 04: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: PERS5 TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Non-persistent jobs exception report: PFA issues the SMF arrival rate non-persistent jobs exception report when the non-persistent jobs (as a group) cause an exception. This exception is only issued for a higher than expected SMF arrival rate. The SMF arrival rate and predictions listed on this report are calculated for the total non-persistent jobs group. The list of jobs contains only three non-persistent jobs that have high arrival counts. No predictions are given for these jobs because PFA does not model individual predictions for jobs that are not tracked individually. The following example is the SMF arrival rate non-persistent jobs exception report (AIRH191E):
Figure 5. SMF arrival rate: non-persistent jobs with high counts
 SMF Arrival Rate Prediction Report                     

Last successful model time      :  01/27/2009 11:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                   
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    

Non-persistent jobs group:
 SMF arrival rate                     
   in last collection interval        : 65.49
 Prediction based on 1 hour of data   : 20.27
 Prediction based on 24 hours of data : 27.98
 Prediction based on 7 days of data   : 31.22
Address spaces with high arrivals:

                          SMF          
  Job                 Arrival          
  Name     ASID        Counts          
  ________ ____  ____________          
  NONPERS1 001F            83          
  NONPERS2 0048            52          
  NONPERS3 0049            47          
No problem and total system exception report: When no exception is issued or when a total SMF arrival rate exception is issued, the following report is generated. When there is no problem or when an exception occurs because of a higher than expected SMF arrival rate, the list of jobs contains the jobs being tracked individually and the list of jobs can vary from one to ten. The Runtime Diagnostics section is written in the report when the exception is issued because there is a lower than expected SMF arrival rate. The following example is the SMF arrival rate no problem report (AIRH176I) and total system exception report issued due to a high than expected SMF arrival rate (AIRH174E) showing three jobs.
Figure 6. SMF arrival rate: no problem and total system higher than expected
 SMF Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 11:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                  
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    

SMF arrival rate 
   at last collection interval       :       83.52
Prediction based on 1 hour of data   :       98.27
Prediction based on 24 hours of data :       85.98
Prediction based on 7 days of data   :      100.22

Top persistent users:                                                      
                                                                           
                                            Predicted SMF                  
                          SMF                Arrival Rate                  
  Job                 Arrival                                              
  Name     ASID          Rate        1 Hour       24 Hour         7 Day    
  ________ ____  ____________  ____________  ____________  ____________    
  TRACKED1 001D         58.00         23.88         22.82         15.82    
  TRACKED2 0028         11.00          0.34         11.11         12.11    
  TRACKED3 0029         11.00         12.43          2.36          8.36    
.
.
.
The following example is the SMF arrival rate total system exception report issued because of a lower than expected SMF arrival rate (AIRH175E):
Figure 7. SMF arrival rate: no problem and total system lower than expected
 SMF Arrival Rate Prediction Report

Last successful model time      : 01/27/2009 11:08:01
Next model time                 : 01/27/2009 23:08:01
Model interval                  : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time            : 01/27/2009 17:56:38
Collection interval             : 15

SMF arrival rate
   in last collection interval       : 2.05
Prediction based on 1 hour of data   : 98.27
Prediction based on 24 hours of data : 85.98
Prediction based on 7 days of data   : 100.22

Runtime Diagnostics Output:

  EVENT 01: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:23:25
  IXL013I IXLCONN REQUEST FOR STRUCTURE SYSZWLM_WORKUNIT FAILED.
  JOBNAME: WLM ASID: 000A CONNECTOR NAME: #SY1
  IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
  STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
  CONADIAG0: 00000002
----------------------------------------------------------------------
  EVENT 02: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:55
  IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
  RSN: START REQUEST FAILED
----------------------------------------------------------------------
  EVENT 03: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:58
  IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
  RSN: START REQUEST FAILED
----------------------------------------------------------------------
  EVENT 04: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:24:02
  IXL013I IXLCONN REQUEST FOR STRUCTURE IXCT_SIGNAL FAILED.
  JOBNAME: XCFAS ASID: 0006 CONNECTOR NAME: SIGPATH_01000008
  IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
  STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
  CONADIAG0: 00000002
----------------------------------------------------------------------
  EVENT 05: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:24:02
  IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
  RSN: START REQUEST FAILED
----------------------------------------------------------------------
  EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: DAVIDZ
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: DAVIDZ TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Note: In accordance with the IBM Health Checker for z/OS messaging guidelines, the largest generated output length for decimal variable values up to 2147483647 (X'7FFFFFF') is 10 bytes. When any PFA report value is greater than 2147483647, it displays using multiplier notation with a maximum of six characters. For example, if the report value is 2222233333444445555, PFA displays it as 1973P (2222233333444445555 ÷ 1125899906842) using the following multiplier notation:
Table 2. Multiplier notation used in values for PFA reports
Name Sym Size
Kilo K 1,024
Mega M 1,048,576
Giga G 1,073,741,824
Tera T 1,099,511,627,776
Peta P 1,125,899,906,842
The following fields apply to all four reports:
  • Last successful model time: The date and time of the last successful model for this check. The predictions on this report were generated at that time.
  • Next model time: The date and time of the next model. The next model will recalculate the predictions.
  • Model interval: The value in the configured MODELINT parameter for this check. If PFA determines new prediction calculations are necessary, modeling can occur earlier.
  • Last successful collection time: The date and time of the last successful data collection for this check.
  • Next collection time: The date and time of the next collection.
  • Collection interval: The value in the configured COLLECTINT parameter for this check.
  • Start of changeSMF arrival rate in last collection interval: The actual SMF arrival rate in the last collection interval where the rate is defined to be the number of messages divided by the CPU milliseconds.End of change
  • Predicted rates based on…: The SMF arrival rates based on one hour, 24 hours, and seven days. If no prediction is available for a given time range, the line is not printed. For example, if the check has been running for 2 days, 7 days of data is not available therefore PFA does not print the "Prediction based on 7 days of data" line.
  • Runtime Diagnostics Output: Runtime Diagnostics event records to assist you in diagnosing and fixing the problem. See the topic on Runtime Diagnostics symptoms in Runtime Diagnostics.
  • Job Name: The name of the job that has SMF arrivals in the last collection interval.
  • ASID: The ASID for the job that has SMF arrivals in the last collection interval.
  • SMF Arrival Rate: The current SMF arrival rate for the job.
  • SMF Arrival Count: The SMF arrival rate, from the last interval report, for the non-persistent job.
    Note: The "SMF Arrival Count" field is unique to the non-persistent jobs exception report.
  • Predicted SMF Arrival Rate: The predicted SMF arrival rate based on one hour, 24 hours, and seven days of data. If PFA did not previously run on this system or the same jobs previously tracked are not all active, there will not be enough data for a time range until that amount of time has passed. Also, gaps in the data caused by stopping PFA or by an IPL might cause the time range to not have enough data available. After the check collects enough data for any time range, predictions are made again for that time range. If there is not enough data for a time range, INELIGIBLE is printed and comparisons are not made for that time range.
Directories
Note: The content and names for these files and directories are subject to change and cannot be used as programming interfaces; these files are documented only to provide help in diagnosing problems with PFA.
pfa_directory
This directory contains all the PFA checks and is pointed to by the home directory of the started task. The following files only contain data if messages are generated by the JVM:
  • java.stderr (generated by JVM)
  • java.stdout (generated by JVM)
pfa_directory/PFA_SMF_ARRIVAL_RATE/data
The directory for SMF arrival rate that holds data and modeling results. PFA automatically deletes the contents of the PFA_SMF_ARRIVAL_RATE/data directory that could lead to skewed predictions in the future.

Guideline: If the use of the z/OS image is radically different after an IPL (for instance, the change from a test system to a production system) of if you modify the SMF record types in SMFPRMxx, delete the files in the PFA_SMF_ARRIVAL_RATE/data directory to ensure the check can collect the most accurate modeling information.

Results files

  • systemName.1hr.prediction - This file is generated by the modeling code for the predictions made for one hour of historical data. It contains predictions for each of the tracked address spaces, the other persistent category, the non-persistent category, and the total system category. It also contains additional information required for PFA processing.
  • systemName.24hr.prediction - This file is generated by the modeling code for the predictions made for 24 hours of historical data. It contains predictions for each of the tracked address spaces, the other persistent category, the non-persistent category, and the total system category. It also contains additional information required for PFA processing.
  • systemName.7day.prediction - This file is generated by the modeling code for the predictions made for seven days of historical data. It contains predictions for each of the tracked address spaces, the other persistent category, the non-persistent category, and the total system category. It also contains additional information required for PFA processing.
  • systemName.1hr.prediction.html - This file contains an .html report version of the data found in the systemName.1hr.prediction file.
  • systemName.24hr.prediction.html - This file contains an .html report version of the data found in the systemName.24hr.prediction file.
  • systemName.7day.prediction.html - This file contains an .html report version of the data found in the systemName.7day.prediction file.
  • systemName.prediction.stddev - The file generated by the modeling code to list the standard deviation of the predictions across the time ranges for each job.

Data store files:

  • systemNameSAR.OUT - The data collection file.

Intermediate files:

  • systemName.sardata - The file is used as input to the modeling to track if enough data is available to model.
  • systemName.1hr.sardata - The file used as input to modeling code. It contains one hour of historical data.
  • systemName.24hr.sardata - The file used as input to modeling code. It contains 24 hours of historical data.
  • systemName.7day.sardata - The file used as input to modeling code. It contains seven days of historical data.
  • systemName.1hr.holes - The file is used to track gaps in data, caused by stopping PFA or by an IPL, for a one hour period.
  • systemName.24hr.holes - The file is used to track gaps in the data, caused by stopping PFA or by an IPL, for a 24 hour time period.
  • systemName.7day.holes - The file is used to track gaps in the data, caused by stopping PFA or by an IPL, for the seven day time period.

This directory holds the following log files. Additional information is written to these log files when DEBUG(1).

  • systemName.1hr.cart.log - The log file generated by modeling code with details about code execution while one hour of historical data was being modeled.
  • systemName.24hr.cart.log - The log file generated by modeling code with details about code execution while 24 hours of historical data was being modeled.
  • systemName.7day.cart.log - The log file generated by modeling code with details about code execution while seven days of historical data was being modeled.
  • systemName.buildSar.log - The log file generated by intermediate code that builds the files that are input to modeling with details about code execution.
  • systemName.launcher.log - The log file generated by launcher code.
  • systemName.1hr.tree - This file is generated by the modeling code. It contains information about the model tree which was built based on the last one hour of collected data.
  • systemName.24hr.tree - This file is generated by the modeling code. It contains information about the model tree which was built based on the last 24 hours of collected data.
  • systemName.7day.tree - This file is generated by the modeling code. It contains information about the model tree which was built based on the last seven days of collected data.
  • systemNameCONFIG.LOG - The log file containing the configuration history for the last 30 days for this check.
  • systemNameCOLLECT.LOG - The log file used during data collection.
  • systemNameMODEL.LOG - The log file used during portions of the modeling phase.
  • systemNameRUN.LOG - The log file used when the check runs.
pfa_directory/PFA_SMF_ARRIVAL_RATE/EXC_timestamp
This directory contains all the relevant data for investigating exceptions issued by this check at the timestamp provided in the directory name. PFA keeps directories only for the last 30 exceptions. Therefore at each exception, if more than 30 exception directories exist, the oldest directory is deleted so that only 30 exceptions remain after the latest exception is added.
  • systemNameREPORT.LOG - The log file containing the same contents as the IBM Health Checker for z/OS report for this exception as well as other diagnostic information issued during report generation (such as Runtime Diagnostic event records).
pfa_directory/PFA_SMF_ARRIVAL_RATE/config
This directory contains the configuration files for the check.
  • EXCLUDED_JOBS - The file containing the list of excluded jobs for this check.
Note: When using Runtime Diagnostics, it is possible to see data for jobs previously defined to the excluded jobs list in the "other persistent jobs" and "total system" categories because PFA must return any potential problem activity on the system identified by Runtime Diagnostics.