DB2 10.5 for Linux, UNIX, and Windows

Historical monitoring with WLM event monitors

DB2® workload management uses event monitors to capture information that might be of use in the future or for historical analysis.

Three event monitors are available for you to use. Each event monitor serves a different purpose:
Activity event monitor
This monitor captures information about individual activities in a service class, workload, or work class or activities that violated a threshold. The amount of data that is captured for each activity is configurable and should be considered when you determine the amount of disk space and the length of time required to keep the monitor data. A common use for activity data is to use it as input to tools such as db2advis or to use access plans (from the explain utility) to help determine table, column, and index usage for a set of queries.

You can collect information about an activity by specifying COLLECT ACTIVITY DATA for the service class, workload, or work action to which such an activity belongs or a threshold that might be violated by such an activity. The information is collected when the activity completes, regardless of whether the activity completes successfully.

Note that if an activities event monitor is active when the database deactivates, any backlogged activity records in the queue are discarded. To ensure that you obtain all activities event monitor records and that none are discarded, explicitly deactivate the activities event monitor first before deactivating the database. When an activities event monitor is explicitly deactivated, all backlogged activity records in the queue are processed before the event monitor deactivates.

Threshold violations event monitor
This monitor captures information when a threshold is violated. It indicates what threshold was violated, the activity that caused the violation, and what action was taken when it occurred.

If you specify COLLECT ACTIVITY DATA for the threshold and an activities event monitor is created and active, information is also collected about activities that violate the threshold, but this information is collected when the activity ends (either successfully or unsuccessfully).

You can obtain details about a threshold by querying the SYSCAT.THRESHOLDS view.

Statistics event monitor
This monitor serves as a low-overhead alternative to capturing detailed activity information by collecting aggregate data (for example, the number of activities completed and average execution time). Aggregate data includes histograms for a number of activity measurements including lifetime, queue time, execution time and estimated cost. You can use histograms to understand the distribution of values, identify outliers, and compute additional statistics such as averages and standard deviations. For example, histograms can help you understand the variation in lifetime that users experience. The average life time alone does not reflect what a user experiences if there is a high degree of variability. See Collecting workload management statistics using a statistics event monitor for a description of how to send statistics to the event monitor.

The following figure shows the different monitoring options available to access workload information: table functions to access real-time statistics, and activity details and historical information captured as efficient aggregates or as details about individual activities:

Figure 1. Workload management with monitoring
Workload management with monitoring

Unlike statement, connection, and transaction event monitors, the activity, statistics, and threshold violations event monitors do not have event conditions (that is, conditions specified on the WHERE keyword of the CREATE EVENT MONITOR statement). Instead, these event monitors rely on the attributes of service classes, workloads, work classes, and thresholds to determine whether these objects send their activity information or aggregate information to these monitors.

Typically, event monitors write data to either tables or files. You need to prune these tables or files periodically because they are not automatically pruned.

You can use the wlmevmon.ddl script in the sqllib/misc directory to create and enable three event monitors called DB2ACTIVITIES, DB2STATISTICS, and DB2THRESHOLDVIOLATIONS. If necessary, modify the script to change the table space or other parameters.

Example

Example: Identify queries with a large estimated cost using the statistics event monitor: You suspect that your database workload occasionally includes large, expensive queries, possibly due to the poor optimization of the queries themselves. You want to identify these queries so that you can prevent them from consuming excessive resources on your system, with a long-term goal of perhaps rewriting some of the queries to improve performance. The statistics event monitor provides you with a low-overhead way to measure the estimated cost of your queries which you can then use to determine what the maximum acceptable estimated cost for a query on your data server should be. A query that is poorly optimized is typically distinguished by a large estimated cost that is many times larger than the estimated cost of most other queries.

To get started, you need to create and activate a statistics event monitor and to start collecting extended aggregate activity data for the service class where the queries run:

CREATE EVENT MONITOR DB2STATISTICS
  FOR STATISTICS WRITE TO TABLE

SET EVENT MONITOR DB2STATISTICS STATE 1
In this example, all queries run in the SYSDEFAULTSUBCLASS subclass of the SYSDEFAULTUSERCLASS service class, which you can alter to collect the required data:
ALTER SERVICE CLASS SYSDEFAULTSUBCLASS UNDER SYSDEFAULTUSERCLASS
   COLLECT AGGREGATE ACTIVITY DATA EXTENDED

A full day of work might provide a reasonable approximation of the range of queries your data server typically processes. At the end of the day, you can copy the statistics collected from memory to the statistics event monitor by running the WLM_COLLECT_STATS stored procedure:

CALL WLM_COLLECT_STATS()

Included with the different statistics written to the event monitor tables are the estimated cost statistics of queries. To see them, you can query the service class statistics table SCSTATS_DB2STATISTICS:

SELECT STATISTICS_TIMESTAMP,
       COORD_ACT_EST_COST_AVG,
       COST_ESTIMATE_TOP
FROM SCSTATS_DB2STATISTICS
WHERE SERVICE_SUPERCLASS_NAME = 'SYSDEFAULTUSERCLASS'
  AND SERVICE_SUBCLASS_NAME = 'SYSDEFAULTSUBCLASS'
STATISTICS_TIMESTAMP       COORD_ACT_EST_COST_AVG COST_ESTIMATE_TOP
-------------------------- ---------------------- --------------------
2008-09-03-09.49.04.455979                 169440             13246445

  1 record(s) selected.

The output shows that the average query has an estimated cost in the range of hundreds of thousands of timerons, and that the largest queries have estimated costs larger than ten million timerons. You can confirm that queries of ten million or more timerons are outliers by looking at the estimated cost histogram, which was generated at the same time that the average and high watermarks shown in the output were written to the event monitor table. You can look at the histogram by querying the HISTOGRAMBIN_DB2STATISTICS table as follows:

SELECT STATISTICS_TIMESTAMP,
       TOP,
       NUMBER_IN_BIN
FROM HISTOGRAMBIN_DB2STATISTICS HIST,
     SYSCAT.SERVICECLASSES SC
WHERE HIST.SERVICE_CLASS_ID = SC.SERVICECLASSID
  AND SC.PARENTSERVICECLASSNAME = 'SYSDEFAULTUSERCLASS'
  AND SC.SERVICECLASSNAME = 'SYSDEFAULTSUBCLASS'
  AND HISTOGRAM_TYPE = 'COORDACTESTCOST'"
STATISTICS_TIMESTAMP       TOP                  NUMBER_IN_BIN
-------------------------- -------------------- --------------------
2008-09-03-09.49.04.455979                    1                    0
2008-09-03-09.49.04.455979                    2                    0
2008-09-03-09.49.04.455979                    3                    0
2008-09-03-09.49.04.455979                    5                    0
2008-09-03-09.49.04.455979                    8                    0
2008-09-03-09.49.04.455979                   12                    1
2008-09-03-09.49.04.455979                   19                    0
2008-09-03-09.49.04.455979                   29                    0
2008-09-03-09.49.04.455979                   44                    2
2008-09-03-09.49.04.455979                   68                    5
2008-09-03-09.49.04.455979                  103                   22
2008-09-03-09.49.04.455979                  158                   14
2008-09-03-09.49.04.455979                  241                   54
2008-09-03-09.49.04.455979                  369                    2
2008-09-03-09.49.04.455979                  562                  142
2008-09-03-09.49.04.455979                  858                   21
2008-09-03-09.49.04.455979                 1309                  123
2008-09-03-09.49.04.455979                 1997                  512
2008-09-03-09.49.04.455979                 3046                  643
2008-09-03-09.49.04.455979                 4647                  201
2008-09-03-09.49.04.455979                 7089                  875
2008-09-03-09.49.04.455979                10813                 1445
2008-09-03-09.49.04.455979                16493                 5386
2008-09-03-09.49.04.455979                25157                 2409
2008-09-03-09.49.04.455979                38373                 8940
2008-09-03-09.49.04.455979                58532                 9820
2008-09-03-09.49.04.455979                89280                 2149
2008-09-03-09.49.04.455979               136181                  798
2008-09-03-09.49.04.455979               207720                 2411
2008-09-03-09.49.04.455979               316840                14989
2008-09-03-09.49.04.455979               483283                 9831
2008-09-03-09.49.04.455979               737162                 1451
2008-09-03-09.49.04.455979              1124409                  213
2008-09-03-09.49.04.455979              1715085                   24
2008-09-03-09.49.04.455979              2616055                    1
2008-09-03-09.49.04.455979              3990325                    0
2008-09-03-09.49.04.455979              6086529                    0
2008-09-03-09.49.04.455979              9283913                    0
2008-09-03-09.49.04.455979             14160950                    3
2008-09-03-09.49.04.455979             21600000                    0
2008-09-03-09.49.04.455979                   -1                    0

In the histogram, the value in the number_in_bin column for queries whose top is greater than 2616055 is zero until top reaches 14160950, where the number_in_bin becomes 3. These three queries are outliers and can be controlled with an ESTIMATEDSQLCOST threshold set to trigger if the estimated cost of a query exceeds 10 million timerons which you can use to prevent such activities from executing and to monitor them more closely.

Example: Using the threshold violations event monitor: To control activities of a certain estimated cost, you want to define an ESTIMATEDSQLCOST threshold on your workload that applies only to that subset of your total workload exceeding a certain estimated cost. Having looked at the estimated cost histogram, you determined that activities with an estimated cost in the range of 0 to under 3 million timerons occur frequently and that activities with an estimated cost over 10 million timerons occur rarely (perhaps only a few times a day and perhaps always due to some flaw in the query, such as the use of a Cartesian join).

To verify that a threshold of 10 million timerons is effective in stopping those few activities a day that should not be allowed to run, you can create and activate a threshold event monitor:

CREATE THRESHOLD TH1
  FOR DATABASE ACTIVITIES
  ENFORCEMENT DATABASE
  WHEN ESTIMATEDSQLCOST > 10000000
  STOP EXECUTION

CREATE EVENT MONITOR DB2THRESHOLDVIOLATIONS
  FOR THRESHOLD VIOLATIONS
  WRITE TO TABLE

SET EVENT MONITOR DB2THRESHOLDVIOLATIONS STATE 1

After the end of the day, you can see what threshold violations occurred by querying the threshold violations table:

SELECT THRESHOLDID,
       SUBSTR(THRESHOLD_PREDICATE, 1, 20) PREDICATE,
       TIME_OF_VIOLATION,
       THRESHOLD_MAXVALUE,
       THRESHOLD_ACTION
FROM THRESHOLDVIOLATIONS_DB2THRESHOLDVIOLATIONS
ORDER BY TIME_OF_VIOLATION, THRESHOLDID
THRESHOLDID PREDICATE            TIME_OF_VIOLATION          THRESHOLD_MAXVALUE   THRESHOLD_ACTION
----------- -------------------- -------------------------- -------------------- ----------------
          1 EstimatedSQLCost     2008-09-02-22.39.10.000000             10000000 Stop

  1 record(s) selected.

Example: Using the activity event monitor

The previous example showed how you can collect threshold information in an event monitor table to confirm that activities with a large estimated cost are being prevented from executing by a threshold. After seeing these threshold violations, you want to determine what the SQL statement texts producing these large queries are, so that you can use the explain facility to determine if an index is needed on the tables being queried.

Collecting this additional information requires creating and activating an activity event monitor and altering the threshold to turn on activity collection with details:

CREATE EVENT MONITOR DB2ACTIVITIES
  FOR ACTIVITIES WRITE TO TABLE

SET EVENT MONITOR DB2ACTIVITIES STATE 1

ALTER THRESHOLD TH1
  WHEN EXCEEDED
  COLLECT ACTIVITY DATA WITH DETAILS 

When you query the threshold violations table again after another business day has passed, you can perform a join with the ACTIVITYSTMT_DB2ACTIVITIES table to see the SQL statement text of any activity that violated the threshold:

SELECT THRESHOLDID,
       SUBSTR(THRESHOLD_PREDICATE, 1, 20) PREDICATE,
       TIME_OF_VIOLATION,
       SUBSTR(STMT_TEXT,1,70) STMT_TEXT
FROM THRESHOLDVIOLATIONS_DB2THRESHOLDVIOLATIONS TV,
     ACTIVITYSTMT_DB2ACTIVITIES A
WHERE TV.APPL_ID = A.APPL_ID
  AND TV.UOW_ID = A.UOW_ID
  AND TV.ACTIVITY_ID = A.ACTIVITY_ID
THRESHOLDID PREDICATE            TIME_OF_VIOLATION          STMT_TEXT
----------- -------------------- -------------------------- ----------------------------------------------------------------------
          1 EstimatedSQLCost     2008-09-02-23.04.49.000000 select count(*) from syscat.tables,syscat.tables,syscat.tables

  1 record(s) selected.