PFA_MESSAGE_ARRIVAL_RATE

Description:
Start of changeThis check is determining when an LPAR is damaged by checking the arrival rate of abnormal messages per number of processor millisecond used. If the arrival rate is too high, it might indicate a damaged address space or partition. If the arrival rate is too low, it might indicate a hung address space or partition.

To avoid skewing the message arrival rate, PFA ignores the first hour of message data after IPL and the last hour of message data before shutdown. In addition, PFA attempts to track the same persistent jobs that it tracked before IPL or PFA restart if the same persistent jobs are still active. (The same persistent jobs must still be active for PFA to track and there ten jobs must have previously tracked.)

This check is not designed to detect performance problems that are caused by insufficient resources, faulty WLM policy, or spikes in work. However, it might help to determine if a performance problem detected by a performance monitor or WLM is caused by a damaged system.

The message arrival rate check issues an exception using four types of comparisons, which are described in more detail in the next section:

  • top persistent jobs
  • other persistent
  • non-persistent jobs
  • total system
After PFA issues an exception, the next comparison type is not performed. The CONSOLE address space and any jobs that match the job and system combinations defined in the config/EXCLUDED_JOBS file that have been read for PFA processing are not included in the processing for any of the four types of comparisons. By default, an EXCLUDED_JOBS file containing the all address spaces that match JES* on all systems is created during installation. Therefore, if you have not made any modifications to the EXCLUDED_JOBS file, these jobs will be excluded. See Using and configuring supervised learning for more information.
Top persistent jobs
PFA tracks the top persistent jobs individually. Jobs are considered persistent if they start within an hour after IPL. PFA determines which jobs to track individually based on the following criteria:
  • If PFA previously ran on this system and the same 10 jobs that were previously tracked are active, PFA tracks the same jobs.
  • If PFA did not previously run on this system or the same jobs previously tracked are not all active, PFA collects data for a period of time to use in determining which jobs have the highest arrival rates. After this time passes, PFA individually tracks the jobs that have the highest arrival rates for that period.
    • During the first hour after IPL and during the time PFA is determining the jobs to track individually, normal data collection and modeling are suspended.
    • Changing the COLLECTINT or the MODELINT parameters during these times is allowed, but the changes are not used until after these times have passed.
    • Next collection and model times change automatically during these times to reflect the most accurate times known at each phase of the initial processing.
This top persistent jobs comparison is performed to determine if the message rate is higher than expected or lower than expected.
Other persistent jobs
The persistent jobs that PFA does not track individually are the other persistent jobs. PFA generates the predictions using the totals for this group. When determining if the message arrival rate is higher than expected, PFA performs the comparisons individually using a mathematical formula. When determining if the message rate is lower than expected, PFA performs the comparisons using the totals for the group.
Non-persistent jobs
The jobs that start over an hour after IPL are the non-persistent jobs. PFA performs the predictions and the comparisons using the totals for this group. This type of comparison is only used to determine if the message rate is higher than expected.
Total system
This group includes all jobs. PFA performs the predictions and the comparisons for the entire system to determine if the message rate is higher than expected or lower than expected.
End of change
Reason for check:
Start of changeThe objective of this check is to determine if an LPAR is damaged and if an address space or partition is hung by checking the arrival rate of abnormal messages per number of CPU millisecond used.End of change
Best practice:
If PFA detects an unexpectedly high amount of messages, the best practice is to analyze the messages being sent by the address spaces identified on the report by examining the system log to determine what is causing this burst of message activity. Establish which messages were issued around the time of the activity and review the message details. Follow the directions provided by the message to continue to diagnose and fix the problem.

If PFA detects an unexpectedly low number of messages, examine the report in SDSF for details about why the exception was issued. Use the Runtime Diagnostics output in the report to assist you in diagnosing and fixing the problem. For more information about Runtime Diagnostics see Runtime Diagnostics.

z/OS® releases the check applies to:
z/OS V1R11 and later.
Type of check:
Remote
Parameters accepted:
Yes, as follows:
Table 1. PFA_MESSAGE_ARRIVAL_RATE check parameters
Parameter name Default value Minimum Value Maximum Value Description
collectint 15 Minutes 15 360 This parameter determines how often (in minutes) to run the data collector that retrieves the current message arrival rate.
modelint 720 Minutes 60 1440 This parameter determines how often (in minutes) you want the system to analyze the data and construct a new message arrival rate model or prediction. By default, PFA analyzes the data and constructs a new model every “default value” minutes. The model interval must be at least four times larger than the collection interval. Note that, even when you set a value larger than 360, PFA performs the first model at 360 minutes (6 hours). By default, PFA analyzes the data and constructs a new model every 720 minutes (12 hours).
stddev 10 2 100 This parameter is used to specify how much variance is allowed between the actual message arrival rate per amount of CPU and the expected message arrival rate. It is used when determining if the actual message arrival rate has increased beyond the allowable upper limit. It also determines how much variance is allowed across the time range predictions. If you set the STDDEV parameter to a smaller value, an exception is issued if the actual message arrival rate is closer to the expected message arrival rate and the predictions across the time ranges are consistent. If you set the STDDEV parameter to a larger value, an exception is issued if the actual message arrival rate is significantly greater than the expected message arrival rate even if the predictions across the different time ranges are inconsistent.
collectinactive 1 (on) 0 (off) 1 (on) Defines whether data will be collected and modeled even if the check is not eligible to run, not ACTIVE(ENABLED), in IBM® Health Checker for z/OS.
trackedmin 3 0 1000 This parameter defines the minimum message arrival rate required for a persistent job in order for it to be considered a top persistent job that should be tracked individually.
exceptionmin 1 0 1000 This parameter is used when determining if an exception should be issued for an unexpectedly high message arrival rate. For tracked jobs and other persistent jobs, this parameter defines the minimum message arrival rate and the minimum predicted message arrival rate required to cause a too high exception. For non-persistent jobs and the total system comparisons, this parameter defines the minimum message arrival rate required to cause a too high exception.
checklow 1 (on) 0 (off) 1 Defines whether Runtime Diagnostics is run to validate that the absence of messages is caused by a problem. If this value is off, exceptions are not issued for conditions in which the message arrival rate is unexpectedly low.
stddevlow 4 2 100 This parameter is used to specify how much variance is allowed between the actual message arrival rate per amount of CPU and the expected message arrival rate when determining if the actual rate is unexpectedly low.
  • If you set the STDDEVLOW parameter to a smaller value, an exception is issued when the actual message arrival rate is closer to the expected message arrival rate.
  • If you set the STDDEVLOW parameter to a larger value, an exception is issued when the actual message arrival rate is significantly lower than the expected message arrival rate.
limitlow 3 1 100 This parameter defines the maximum message arrival rate allowed when issuing an exception for an unexpectedly low number of messages.
debug 0 (off) 0 (off) 1 (on) This parameter (an integer of 0 or 1) is used at the direction of IBM service to generate additional diagnostic information for the IBM Support Center. This debug parameter is used in place of the IBM Health Checker for z/OS policy. The default is off (0).
To determine the status of the message arrival rate check, issue f pfa,display,check(pfa_message_arrival_rate),detail. For the command example and more details, see . The following example shows the output written to message AIR018I in SDSF:
AIR018I 02:22:54 PFA CHECK DETAIL

CHECK NAME:  PFA_MESSAGE_ARRIVAL_RATE
    ACTIVE                         : YES
    TOTAL COLLECTION COUNT         : 5
    SUCCESSFUL COLLECTION COUNT    : 5
    LAST COLLECTION TIME           : 02/05/2009 10:18:22
    LAST SUCCESSFUL COLLECTION TIME: 02/05/2009 10:18:22
    NEXT COLLECTION TIME           : 02/05/2009 10:33:22
    TOTAL MODEL COUNT              : 1
    SUCCESSFUL MODEL COUNT         : 1 
    LAST MODEL TIME                : 02/05/2009 10:18:24
    LAST SUCCESSFUL MODEL TIME     : 02/05/2009 10:18:24
    NEXT MODEL TIME                : 02/05/2009 22:18:24
    CHECK SPECIFIC PARAMETERS:
       COLLECTINT                  : 15
       MODELINT                    : 720
       COLLECTINACTIVE             : 1=ON
       DEBUG                       : 0=OFF
       STDDEV                      : 10
       Start of changeTRACKEDMIN                  : 3End of change
       EXCEPTIONMIN                : 1
       CHECKLOW                    : 1=ON
       STDDEVLOW                   : 4
       LIMITLOW                    : 3
       EXCLUDED JOBS:
      NAME     SYSTEM   DATE ADDED       REASON ADDED           
      JES      *        2010/03/31 00:00 Exclude JES* jobs on ALL.
User override of IBM values:
The following shows keywords you can use to override check values on either a POLICY statement in the HZSPRMxx parmlib member or on a MODIFY command. This statement can be copied and modified to override the check defaults:
UPDATE CHECK(IBMPFA,PFA_MESSAGE_ARRIVAL_RATE)
            ACTIVE
            SEVERITY(MEDIUM)
            INTERVAL(ONETIME)
      PARMS=('COLLECTINT(15)','MODELINT(720)','STDDEV(10)','DEBUG(0)',
            'COLLECTINACTIVE(1)','EXCEPTIONMIN(1)',Start of change'TRACKEDMIN(3)'End of change
            'CHECKLOW(1)','STDDEVLOW(4)','LIMITLOW(3)')
            DATE(20080330)
      REASON('The message arrival rate is abnormal which
      can indicate a system that is damaged.')
The message arrival rate check is designed to run automatically after every data collection. Do not change the INTERVAL parameter.
Verbose support:
The check provides additional detail in verbose mode. You can put a check into verbose mode using the UPDATE,filters,VERBOSE=ON parameters on either the MODIFY command or in a POLICY statement in an HZSPRMxx parmlib member.
Debug support:
The DEBUG parameter in IBM Health Checker for z/OS is ignored by this check. Rather, the debug parameter is a PFA check specific parameter. For details, see Understanding how to modify PFA checks.
Reference:
For more information about PFA, see the topic on Overview of Predictive Failure Analysis.
Messages:
The output is a message arrival rate prediction report that corresponds to the message issued. One of the following reports is generated:
  • AIRH152E or AIRH153E – total system exception report
  • AIRH165E or AIRH206E – tracked jobs exception report
  • AIRH166E or AIRH207E – other persistent jobs exception report
  • AIRH169E – other non-persistent jobs exception report
For additional message information, see the topics on:
SECLABEL recommended for MLS users:
SYSLOW
Output:
The output is a variation of the message arrival rate prediction report. The values found in the message arrival prediction report are as follows:
Tracked top persistent jobs exception report: PFA issues the message arrival rate tracked jobs exception report when any one or more tracked, persistent jobs cause an exception. The exception can be caused by a higher than expected message arrival rate or a lower than expected message arrival rate. Only the tracked jobs that caused the exception are included in the list of jobs on the report. If the report was generated due to a lower than expected message arrival rate, the report includes Runtime Diagnostics output, which can help you diagnose the behavior. The following example is the message arrival rate tracked jobs exception report for jobs that had a higher than expected message arrival rate (for AIRH165E):
Figure 1. Message arrival rate prediction report: tracked jobs higher than expected
 Message Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 17:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                   
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    

Persistent address spaces with high rates:                                 
                                                                           
                                           Predicted Message               
                      Message                Arrival Rate                  
  Job                 Arrival                                              
  Name     ASID          Rate        1 Hour       24 Hour         7 Day    
  ________ ____  ____________  ____________  ____________  ____________    
  TRACKED1 001D         75.63         23.88         22.82         15.82    
  TRACKED2 0028         43.52          0.34         11.11         12.11    
  TRACKED3 0029         11.00         12.43          2.36          8.36    
This example is the message arrival rate tracked jobs exception report for jobs that had a lower than expected message arrival rate (for AIH206E):
Figure 2. Message arrival rate prediction report: tracked jobs lower than expected
 Message Arrival Rate Prediction Report

Last successful model time      : 01/27/2009 17:08:01
Next model time                 : 01/27/2009 23:08:01
Model interval                  : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time            : 01/27/2009 17:56:38
Collection interval             : 15

Persistent address spaces with low rates:
                                      Predicted Message
                   Message              Arrival Rate
Job                Arrival
Name     ASID         Rate       1 Hour      24 Hour        7 Day
________ ____ ____________ ____________ ____________ ____________
JOBS4    001F         1.17        23.88        22.82        15.82
JOBS5    002D         2.01         8.34        11.11        12.11

Runtime Diagnostics Output:
Runtime Diagnostics detected a problem in job: JOBS4
  EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: JOBS4
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: JOBS4 TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Runtime Diagnostics detected a problem in job: JOBS5
  EVENT 03: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: JOBS5
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 04: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: JOBS5 TCB: 004E6850
STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Other persistent jobs exception report: PFA issues the message arrival rate other persistent jobs exception report when a comparison of a persistent job (that is not being individually tracked) causes an exception when compared to the totals of the other persistent jobs. The exception can be caused by a higher than expected message arrival rate or a lower than expected message arrival rate. The predictions listed on this report are the predicted rates for the total other persistent jobs group. The list of jobs is only those persistent jobs (not tracked individually) that have a problem and is only generated for a higher than expected message arrival rate. No predictions are given for these jobs because PFA does not model individual predictions for jobs that are not tracked individually. If there is more than one job with the same name, four asterisks **** are printed for the ASID in the report. If the report was generated due to a lower than expected message arrival rate, the report will include Runtime Diagnostics output which can help diagnose the behavior. The following example is the message arrival rate exception report for other persistent jobs that had an unexpectedly high message arrival rate (for AIRH166E):
Figure 3. Message arrival rate prediction report: other persistent jobs with high arrival rate
Message Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 17:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                  
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    
                                           
Other persistent jobs group:                
Prediction based on 1 hour of data   : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data   : 31.22

Persistent address spaces with high rates: 
                                           
                      Message
  Job                 Arrival         
  Name     ASID          Rate         
  ________ ____  ____________         
  PERS1    001E         83.22         
  PERS2    0038         75.52         
  PERS3    0039         47.47     
Note: In the "other persistent jobs" and "total system" categories, when using Runtime Diagnostics, it is possible to see data for jobs previously defined to the excluded jobs list because PFA must return any potential problem activity on the system identified by Runtime Diagnostics.
The following example is the message arrival rate exception report for other persistent jobs that had an unexpectedly low message arrival rate (for AIRH207E):
Figure 4. Message arrival rate prediction report: other persistent jobs with low arrival rate
 Message Arrival Rate Prediction Report

Last successful model time      : 01/27/2009 17:08:01
Next model time                 : 01/27/2009 23:08:01
Model interval                  : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time            : 01/27/2009 17:56:38
Collection interval             : 15

Other persistent jobs group:
Prediction based on 1 hour of data   : 20.27
Prediction based on 24 hours of data : 27.98
Prediction based on 7 days of data   : 31.22

Runtime Diagnostics Output:

  EVENT 01: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: PERS4
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 02: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: PERS4 TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Non-persistent jobs exception report: PFA issues the message arrival rate non-persistent jobs exception report when the non-persistent jobs as a group can cause an exception. This exception is only issued for a higher than expected message arrival rate. The message arrival rate and predictions listed on this report are calculated for the total non-persistent jobs group. The list of jobs contains only three non-persistent jobs that have high arrival counts. No predictions are given for these jobs because PFA does not model individual predictions for jobs that are not tracked individually. The following example is the message arrival rate non-persistent jobs exception report (for AIRH169E):
Figure 5. Message arrival rate prediction report: non-persistent jobs higher than expected
Message Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 17:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                  
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    

Non-persistent jobs group:
 Message arrival rate                     
   in last collection interval        : 65.49
 Prediction based on 1 hour of data   : 20.27
 Prediction based on 24 hours of data : 27.98
 Prediction based on 7 days of data   : 31.22
                                             
Address spaces with high arrivals:

                      Message
  Job                 Arrival             
  Name     ASID        Counts        
  ________ ____  ____________  
  NONPERS1 001F            83           
  NONPERS2 0048            52             
  NONPERS3 0049            47
No problem and total system exception report: PFA issues the message arrival rate system report when no exception is issued or when the total message arrival rate exception is issued. When there is no problem or when an exception occurs due to a higher than expected message arrival rate, the list of jobs contains all of the jobs being tracked individually. The Runtime Diagnostics section is written in the report when the exception is issued due to a lower than expected message arrival rate. When the report is issued due to a lower than expected message arrival rate, the list of tracked jobs will not be printed on the report. The following example is the message arrival rate no problem total system report issued due to a higher than expected message arrival rate (AIRH152E):
Figure 6. Message arrival rate prediction report: total system higher than expected
            Message Arrival Rate Prediction Report

Last successful model time      :  01/27/2009 17:08:01   
Next model time                 :  01/27/2009 23:08:01   
Model interval                  :  360                  
Last successful collection time :  01/27/2009 17:41:38   
Next collection time            :  01/27/2009 17:56:38   
Collection interval             :  15                    

Message arrival rate 
at last collection interval          :       83.52
Prediction based on 1 hour of data   :       98.27
Prediction based on 24 hours of data :       85.98
Prediction based on 7 days of data   :      100.22

Top persistent users:                                                 
                                                                           
                                           Predicted Message               
                      Message                Arrival Rate                  
  Job                 Arrival                                              
  Name     ASID          Rate        1 Hour       24 Hour         7 Day    
  ________ ____  ____________  ____________  ____________  ____________    
  JOB1     001D         58.00         23.88         22.82         15.82    
  JOB2     0028         11.00          0.34         11.11         12.11    
  JOB3     0029         11.00         12.43          2.36          8.36    
  ...
This example is the message arrival rate total system exception report issued due to a lower than expected message arrival rate (for AIRH153E) with Runtime Diagnostic output:
Figure 7. Message arrival rate prediction report: total system lower than expected
Message Arrival Rate Prediction Report

Last successful model time      : 01/27/2009 17:08:01
Next model time                 : 01/27/2009 23:08:01
Model interval                  : 360
Last successful collection time : 01/27/2009 17:41:38
Next collection time            : 01/27/2009 17:56:38
Collection interval             : 15

Message arrival rate
  in last collection interval        : 2.02
Prediction based on 1 hour of data   : 98.27
Prediction based on 24 hours of data : 85.98
Prediction based on 7 days of data   : 100.22

Runtime Diagnostics Output:

  EVENT 01: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:23:25
  IXL013I IXLCONN REQUEST FOR STRUCTURE SYSZWLM_WORKUNIT FAILED.
  JOBNAME: WLM ASID: 000A CONNECTOR NAME: #SY1
  IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
  STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
  CONADIAG0: 00000002
----------------------------------------------------------------------
  EVENT 02: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:55
  IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
  RSN: START REQUEST FAILED
----------------------------------------------------------------------
  EVENT 03: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:23:58
  IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
  RSN: START REQUEST FAILED
----------------------------------------------------------------------
  EVENT 04: HIGH - CF - SYSTEM: SY1 2009/06/12 - 13:24:02
  IXL013I IXLCONN REQUEST FOR STRUCTURE IXCT_SIGNAL FAILED.
  JOBNAME: XCFAS ASID: 0006 CONNECTOR NAME: SIGPATH_01000008
  IXLCONN RETURN CODE: 0000000C, REASON CODE: 02010C05
  STRUCTURE NOT DEFINED IN THE CFRM ACTIVE POLICY
  CONADIAG0: 00000002
----------------------------------------------------------------------
  EVENT 05: HIGH - XCF - SYSTEM: SY1 2009/06/12 - 13:24:02
  IXC467I STOPPING PATHIN STRUCTURE IXCT_SIGNAL
  RSN: START REQUEST FAILED
----------------------------------------------------------------------
  EVENT 06: HIGH - HIGHCPU - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID CPU RATE: 96% ASID: 0027 JOBNAME: DAVIDZ
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE USING EXCESSIVE CPU TIME. IT MAY BE LOOPING.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
  EVENT 07: HIGH - LOOP - SYSTEM: SY1 2009/06/12 - 13:28:46
  ASID: 0027 JOBNAME: DAVIDZ TCB: 004E6850
  STEPNAME: DAVIDZ PROCSTEP: DAVIDZ JOBID: STC00042 USERID: ++++++++
  JOBSTART: 2009/06/12 - 13:28:35
Error: 
  ADDRESS SPACE APPEARS TO BE IN A LOOP.
Action: 
  USE YOUR SOFTWARE MONITORS TO INVESTIGATE THE ASID.
----------------------------------------------------------------------
Note: In accordance with the IBM Health Checker for z/OS messaging guidelines, the largest generated output length for decimal variable values up to 2147483647 (X'7FFFFFF') is 10 bytes. When any PFA report value is greater than 2147483647, it displays using multiplier notation with a maximum of six characters. For example, if the report value is 2222233333444445555, PFA displays it as 1973P (2222233333444445555 ÷ 1125899906842) using the following multiplier notation:
Table 2. Multiplier notation used in values for PFA reports
Name Sym Size
Kilo K 1,024
Mega M 1,048,576
Giga G 1,073,741,824
Tera T 1,099,511,627,776
Peta P 1,125,899,906,842

The following fields apply to all reports:

  • Last successful model time: The date and time of the last successful model for this check. The predictions on this report were generated at that time.
  • Next model time: The date and time of the next model. The next model will recalculate the predictions.
  • Model interval: The value in the configured MODELINT parameter for this check. If PFA determines new prediction calculations are necessary, modeling can occur earlier.
  • Last successful collection time: The date and time of the last successful data collection for this check.
  • Next collection time: The date and time of the next collection.
  • Collection interval: The value in the configured COLLECTINT parameter for this check.
  • Start of changeMessage arrival rate in last collection interval: The actual message arrival rate in the last collection interval where the rate is defined to be the number of messages divided by the CPU milliseconds.End of change
  • Predicted rates based on…: The message arrival rates based on one hour, 24 hours, and seven days. If no prediction is available for a given time range, the line is not printed. For example, if the check has been running for two days, seven days of data is not available and the "Prediction based on 7 days of data" line is not printed.
  • Runtime Diagnostics Output: Runtime Diagnostics event records to assist you in diagnosing and fixing the problem. See the topic on Runtime Diagnostics symptoms in Runtime Diagnostics.
  • Job Name: The name of the job that has message arrivals in the last collection interval.
  • ASID: The ASID for the job that has message arrivals in the last collection interval.
  • Message Arrival Rate: The current message arrival rate for the persistent job.
  • Message Arrival Counts: The message arrival count for the non-persistent job.
    Note: The "Message Arrival Count" field is unique to the non-persistent jobs exception report.
  • Predicted Message Arrival Rate: The predicted message arrival rate based on one hour, 24 hours, and seven days of data. If PFA did not previously run on this system or the same jobs previously tracked are not all active, there will not be enough data for a time range until that amount of time has passed. Also, gaps in the data caused by stopping PFA or by an IPL might cause the time range to not have enough data available. After the check collects enough data for any time range, predictions are made again for that time range. If there is not enough data for a time range, INELIGIBLE is printed and comparisons are not made for that time range.
Directories
Note: The content and names for these files and directories are subject to change and cannot be used as programming interfaces; these files are documented only to provide help in diagnosing problems with PFA.
pfa_directory
This directory contains all the PFA checks and is pointed to by the home directory of the started task. The following files only contain data if messages are generated by the JVM:
  • java.stderr (generated by JVM)
  • java.stdout (generated by JVM)
pfa_directory/PFA_MESSAGE_ARRIVAL_RATE/data
The directory for message arrival rate that holds data and modeling results.

Guideline: If the use of the z/OS image is radically different after an IPL (for instance, the change from a test system to a production system), delete the files in the PFA_MESSAGE_ARRIVAL_RATE/data directory to enable the check to collect the most accurate modeling information.

Results files

  • systemName.1hr.prediction - This file is generated by the modeling code for the predictions made for one hour of historical data. It contains predictions for each of the tracked address spaces, the other persistent category, the non-persistent category, and the total system category. It also contains additional information required for PFA processing.
  • systemName.24hr.prediction - This file is generated by the modeling code for the predictions made for 24 hours of historical data. It contains predictions for each of the tracked address spaces, the other persistent category, the non-persistent category, and the total system category. It also contains additional information required for PFA processing.
  • systemName.7day.prediction - This file is generated by the modeling code for the predictions made for seven days of historical data. It contains predictions for each of the tracked address spaces, the other persistent category, the non-persistent category, and the total system category. It also contains additional information required for PFA processing.
  • systemName.1hr.prediction.html - This file lists the persistent address spaces in an .html report format for the predictions made for one hour of historical data.
  • systemName.24hr.prediction.html - This file lists the persistent address spaces in an .html report format for the predictions made for 24 hours of historical data.
  • systemName.7day.prediction.html - This file lists the persistent address spaces in an .html report format for the predictions made for seven days of historical data.
  • systemName.prediction.stddev - The file generated by the modeling code to list the standard deviation of the predictions across the time ranges for each address space.

Data store files:

  • systemNameMAR.OUT - The data collection file.

Intermediate files:

  • systemName.mardata - The file is used as input to the modeling to track if enough data is available to model.
  • systemName.1hr.mardata - The file used as input to modeling code. It contains one hour of historical data.
  • systemName.24hr.mardata - The file used as input to modeling code. It contains 24 hours of historical data.
  • systemName.7day.mardata - The file used as input to modeling code. It contains seven days of historical data.
  • systemName.1hr.holes - The file is used to track gaps in the data for a one hour time period. Gaps are caused by stopping PFA or by an IPL.
  • systemName.24hr.holes - The file is used to track gaps in the data for a 24 hour time period. Gaps are caused by stopping PFA or by an IPL.
  • systemName.7day.holes - The file is used to track gaps in the data for a seven day time period. Gaps are caused by stopping PFA or by an IPL.

This directory holds the following log files. Additional information is written to these log files when DEBUG(1).

  • systemName.1hr.cart.log - The log file generated by modeling code with details about code execution while one hour of historical data was being modeled.
  • systemName.24hr.cart.log - The log file generated by modeling code with details about code execution while 24 hours of historical data was being modeled.
  • systemName.7day.cart.log - The log file generated by modeling code with details about code execution while seven days of historical data was being modeled.
  • systemName.1hr.tree - This file is generated by the modeling code. It contains information about the model tree which was built based on the last one hour of collected data.
  • systemName.24hr.tree - This file is generated by the modeling code. It contains information about the model tree which was built based on the last 24 hours of collected data.
  • systemName.7day.tree - This file is generated by the modeling code. It contains information about the model tree which was built based on the last seven days of collected data.
  • systemName.buildMar.log - The log file generated by intermediate code that builds the files that are input to modeling with details about code execution.
  • systemName.launcher.log - The log file generated by launcher code.
  • systemNameCONFIG.LOG - The log file containing the configuration history for the last 30 days for this check.
  • systemNameCOLLECT.LOG - The log file used during data collection.
  • systemNameMODEL.LOG - The log file used during portions of the modeling phase.
  • systemNameRUN.LOG - The log file used when the check runs.
pfa_directory/PFA_MESSAGE_ARRIVAL_RATE/EXC_timestamp
This directory contains all the relevant data for investigating exceptions issued by this check at the timestamp provided in the directory name. PFA keeps directories only for the last 30 exceptions. Therefore at each exception, if more than 30 exception directories exist, the oldest directory is deleted so that only 30 exceptions remain after the latest exception is added.
  • systemNameREPORT.LOG - The log file containing the same contents as the IBM Health Checker for z/OS report for this exception as well as other diagnostic information issued during report generation.
pfa_directory/PFA_MESSAGE_ARRIVAL_RATE/config
This directory contains the configuration files for the check.
  • EXCLUDED_JOBS - The file containing the list of excluded jobs for this check.
    Note: When using Runtime Diagnostics, it is possible to see data for jobs previously defined to the excluded jobs list in the "other persistent jobs" and "total system" categories because PFA must return any potential problem activity on the system identified by Runtime Diagnostics.