DB2® workload management uses event monitors to capture information that might be of use in the future or for historical analysis.
You can collect information about an activity by specifying COLLECT ACTIVITY DATA for the service class, workload, or work action to which such an activity belongs or a threshold that might be violated by such an activity. The information is collected when the activity completes, regardless of whether the activity completes successfully.
Note that if an activities event monitor is active when the database deactivates, any backlogged activity records in the queue are discarded. To ensure that you obtain all activities event monitor records and that none are discarded, explicitly deactivate the activities event monitor first before deactivating the database. When an activities event monitor is explicitly deactivated, all backlogged activity records in the queue are processed before the event monitor deactivates.
If you specify COLLECT ACTIVITY DATA for the threshold and an activities event monitor is created and active, information is also collected about activities that violate the threshold, but this information is collected when the activity ends (either successfully or unsuccessfully).
You can obtain details about a threshold by querying the SYSCAT.THRESHOLDS view.
The following figure shows the different monitoring options available to access workload information: table functions to access real-time statistics, and activity details and historical information captured as efficient aggregates or as details about individual activities:
Unlike statement, connection, and transaction event monitors, the activity, statistics, and threshold violations event monitors do not have event conditions (that is, conditions specified on the WHERE keyword of the CREATE EVENT MONITOR statement). Instead, these event monitors rely on the attributes of service classes, workloads, work classes, and thresholds to determine whether these objects send their activity information or aggregate information to these monitors.
Typically, event monitors write data to either tables or files. You need to prune these tables or files periodically because they are not automatically pruned.
You can use the wlmevmon.ddl script in the sqllib/misc directory to create and enable three event monitors called DB2ACTIVITIES, DB2STATISTICS, and DB2THRESHOLDVIOLATIONS. If necessary, modify the script to change the table space or other parameters.
Example: Identify queries with a large estimated cost using the statistics event monitor: You suspect that your database workload occasionally includes large, expensive queries, possibly due to the poor optimization of the queries themselves. You want to identify these queries so that you can prevent them from consuming excessive resources on your system, with a long-term goal of perhaps rewriting some of the queries to improve performance. The statistics event monitor provides you with a low-overhead way to measure the estimated cost of your queries which you can then use to determine what the maximum acceptable estimated cost for a query on your data server should be. A query that is poorly optimized is typically distinguished by a large estimated cost that is many times larger than the estimated cost of most other queries.
To get started, you need to create and activate a statistics event monitor and to start collecting extended aggregate activity data for the service class where the queries run:
CREATE EVENT MONITOR DB2STATISTICS
FOR STATISTICS WRITE TO TABLE
SET EVENT MONITOR DB2STATISTICS STATE 1
In this example,
all queries run in the SYSDEFAULTSUBCLASS subclass of the SYSDEFAULTUSERCLASS
service class, which you can alter to collect the required data: ALTER SERVICE CLASS SYSDEFAULTSUBCLASS UNDER SYSDEFAULTUSERCLASS
COLLECT AGGREGATE ACTIVITY DATA EXTENDED
A full day of work might provide a reasonable approximation of the range of queries your data server typically processes. At the end of the day, you can copy the statistics collected from memory to the statistics event monitor by running the WLM_COLLECT_STATS stored procedure:
CALL WLM_COLLECT_STATS()
Included with the different statistics written to the event monitor tables are the estimated cost statistics of queries. To see them, you can query the service class statistics table SCSTATS_DB2STATISTICS:
SELECT STATISTICS_TIMESTAMP,
COORD_ACT_EST_COST_AVG,
COST_ESTIMATE_TOP
FROM SCSTATS_DB2STATISTICS
WHERE SERVICE_SUPERCLASS_NAME = 'SYSDEFAULTUSERCLASS'
AND SERVICE_SUBCLASS_NAME = 'SYSDEFAULTSUBCLASS'
STATISTICS_TIMESTAMP COORD_ACT_EST_COST_AVG COST_ESTIMATE_TOP
-------------------------- ---------------------- --------------------
2008-09-03-09.49.04.455979 169440 13246445
1 record(s) selected.
The output shows that the average query has an estimated cost in the range of hundreds of thousands of timerons, and that the largest queries have estimated costs larger than ten million timerons. You can confirm that queries of ten million or more timerons are outliers by looking at the estimated cost histogram, which was generated at the same time that the average and high watermarks shown in the output were written to the event monitor table. You can look at the histogram by querying the HISTOGRAMBIN_DB2STATISTICS table as follows:
SELECT STATISTICS_TIMESTAMP,
TOP,
NUMBER_IN_BIN
FROM HISTOGRAMBIN_DB2STATISTICS HIST,
SYSCAT.SERVICECLASSES SC
WHERE HIST.SERVICE_CLASS_ID = SC.SERVICECLASSID
AND SC.PARENTSERVICECLASSNAME = 'SYSDEFAULTUSERCLASS'
AND SC.SERVICECLASSNAME = 'SYSDEFAULTSUBCLASS'
AND HISTOGRAM_TYPE = 'COORDACTESTCOST'"
STATISTICS_TIMESTAMP TOP NUMBER_IN_BIN
-------------------------- -------------------- --------------------
2008-09-03-09.49.04.455979 1 0
2008-09-03-09.49.04.455979 2 0
2008-09-03-09.49.04.455979 3 0
2008-09-03-09.49.04.455979 5 0
2008-09-03-09.49.04.455979 8 0
2008-09-03-09.49.04.455979 12 1
2008-09-03-09.49.04.455979 19 0
2008-09-03-09.49.04.455979 29 0
2008-09-03-09.49.04.455979 44 2
2008-09-03-09.49.04.455979 68 5
2008-09-03-09.49.04.455979 103 22
2008-09-03-09.49.04.455979 158 14
2008-09-03-09.49.04.455979 241 54
2008-09-03-09.49.04.455979 369 2
2008-09-03-09.49.04.455979 562 142
2008-09-03-09.49.04.455979 858 21
2008-09-03-09.49.04.455979 1309 123
2008-09-03-09.49.04.455979 1997 512
2008-09-03-09.49.04.455979 3046 643
2008-09-03-09.49.04.455979 4647 201
2008-09-03-09.49.04.455979 7089 875
2008-09-03-09.49.04.455979 10813 1445
2008-09-03-09.49.04.455979 16493 5386
2008-09-03-09.49.04.455979 25157 2409
2008-09-03-09.49.04.455979 38373 8940
2008-09-03-09.49.04.455979 58532 9820
2008-09-03-09.49.04.455979 89280 2149
2008-09-03-09.49.04.455979 136181 798
2008-09-03-09.49.04.455979 207720 2411
2008-09-03-09.49.04.455979 316840 14989
2008-09-03-09.49.04.455979 483283 9831
2008-09-03-09.49.04.455979 737162 1451
2008-09-03-09.49.04.455979 1124409 213
2008-09-03-09.49.04.455979 1715085 24
2008-09-03-09.49.04.455979 2616055 1
2008-09-03-09.49.04.455979 3990325 0
2008-09-03-09.49.04.455979 6086529 0
2008-09-03-09.49.04.455979 9283913 0
2008-09-03-09.49.04.455979 14160950 3
2008-09-03-09.49.04.455979 21600000 0
2008-09-03-09.49.04.455979 -1 0
In the histogram, the value in the number_in_bin column for queries whose top is greater than 2616055 is zero until top reaches 14160950, where the number_in_bin becomes 3. These three queries are outliers and can be controlled with an ESTIMATEDSQLCOST threshold set to trigger if the estimated cost of a query exceeds 10 million timerons which you can use to prevent such activities from executing and to monitor them more closely.
Example: Using the threshold violations event monitor: To control activities of a certain estimated cost, you want to define an ESTIMATEDSQLCOST threshold on your workload that applies only to that subset of your total workload exceeding a certain estimated cost. Having looked at the estimated cost histogram, you determined that activities with an estimated cost in the range of 0 to under 3 million timerons occur frequently and that activities with an estimated cost over 10 million timerons occur rarely (perhaps only a few times a day and perhaps always due to some flaw in the query, such as the use of a Cartesian join).
To verify that a threshold of 10 million timerons is effective in stopping those few activities a day that should not be allowed to run, you can create and activate a threshold event monitor:
CREATE THRESHOLD TH1
FOR DATABASE ACTIVITIES
ENFORCEMENT DATABASE
WHEN ESTIMATEDSQLCOST > 10000000
STOP EXECUTION
CREATE EVENT MONITOR DB2THRESHOLDVIOLATIONS
FOR THRESHOLD VIOLATIONS
WRITE TO TABLE
SET EVENT MONITOR DB2THRESHOLDVIOLATIONS STATE 1
After the end of the day, you can see what threshold violations occurred by querying the threshold violations table:
SELECT THRESHOLDID,
SUBSTR(THRESHOLD_PREDICATE, 1, 20) PREDICATE,
TIME_OF_VIOLATION,
THRESHOLD_MAXVALUE,
THRESHOLD_ACTION
FROM THRESHOLDVIOLATIONS_DB2THRESHOLDVIOLATIONS
ORDER BY TIME_OF_VIOLATION, THRESHOLDID
THRESHOLDID PREDICATE TIME_OF_VIOLATION THRESHOLD_MAXVALUE THRESHOLD_ACTION
----------- -------------------- -------------------------- -------------------- ----------------
1 EstimatedSQLCost 2008-09-02-22.39.10.000000 10000000 Stop
1 record(s) selected.
Example: Using the activity event monitor
The previous example showed how you can collect threshold information in an event monitor table to confirm that activities with a large estimated cost are being prevented from executing by a threshold. After seeing these threshold violations, you want to determine what the SQL statement texts producing these large queries are, so that you can use the explain facility to determine if an index is needed on the tables being queried.
Collecting this additional information requires creating and activating an activity event monitor and altering the threshold to turn on activity collection with details:
CREATE EVENT MONITOR DB2ACTIVITIES
FOR ACTIVITIES WRITE TO TABLE
SET EVENT MONITOR DB2ACTIVITIES STATE 1
ALTER THRESHOLD TH1
WHEN EXCEEDED
COLLECT ACTIVITY DATA WITH DETAILS
When you query the threshold violations table again after another business day has passed, you can perform a join with the ACTIVITYSTMT_DB2ACTIVITIES table to see the SQL statement text of any activity that violated the threshold:
SELECT THRESHOLDID,
SUBSTR(THRESHOLD_PREDICATE, 1, 20) PREDICATE,
TIME_OF_VIOLATION,
SUBSTR(STMT_TEXT,1,70) STMT_TEXT
FROM THRESHOLDVIOLATIONS_DB2THRESHOLDVIOLATIONS TV,
ACTIVITYSTMT_DB2ACTIVITIES A
WHERE TV.APPL_ID = A.APPL_ID
AND TV.UOW_ID = A.UOW_ID
AND TV.ACTIVITY_ID = A.ACTIVITY_ID
THRESHOLDID PREDICATE TIME_OF_VIOLATION STMT_TEXT
----------- -------------------- -------------------------- ----------------------------------------------------------------------
1 EstimatedSQLCost 2008-09-02-23.04.49.000000 select count(*) from syscat.tables,syscat.tables,syscat.tables
1 record(s) selected.