DB2 Version 9.7 for Linux, UNIX, and Windows

Example: Computing averages and a standard deviation from histograms in a DB2 workload manager configuration

One use for histograms is for obtaining the standard deviation for activity lifetimes. The example in this topic shows how bins are used for the calculation of this statistic.

A calculation of the average lifetime for each activity is a useful piece of information. However, the average alone does not accurately describe the user experience. If the variability in activity lifetime is large, the users whom you are supporting might see queries run fast at some times (which is fine) and slow at others (which might not be acceptable). When you define a goal for activity lifetimes, not only is the average lifetime of the activities important but also the standard deviation of the activity lifetime. You need to both understand and control variability to ensure that your users actually experience the observed average.

In a DB2® workload manager configuration, statistics are collected on each database partition. The following example shows how to obtain the average activity lifetime for a single database partition.

Suppose that you have a single-partition environment and histogram with the following bins. There are more bins in the real histograms, but this example is limited to eight bins to make the example simpler.
Bin 1 - 0 to 2 seconds
Bin 2 - 2 to 4 seconds
Bin 3 - 4 to 8 seconds
Bin 4 - 8 to 16 seconds
Bin 5 - 16 to 32 seconds
Bin 6 - 32 to 64 seconds
Bin 7 - 64 to 128 seconds
Bin 8 - 128 seconds to infinity
You can compute an approximation of the average by assuming that the average response time for a query that falls into a bin with the range x to y is (x + y)/2. You can then multiply this number by the number of queries that fell into the bin, sum across all bins, then divide the sum by the total count. For the preceding example, assume that the average response time for each bin is:
Bin 1 average lifetime = (0+2)/2 = 1
Bin 2 average lifetime = (2+4)/2 = 3
Bin 3 average lifetime = (4+8)/2 = 6
Bin 4 average lifetime = (8+16)/2 = 12
Bin 5 average lifetime = (16+32)/2 = 24
Bin 6 average lifetime = (32+64)/2 = 48
Bin 7 average lifetime = (64+128)/2 = 96
Assume that the following histogram was collected during the measurement period:
Bin 1     Bin 2     Bin 3    Bin 4     Bin 5     Bin 6     Bin 7     Bin 8
count     count     count    count     count     count     count     count
   20        30        80       10         5         3         2         0

To calculate average lifetime, bin 8 must be empty. Bin 8 only exists to let you know when you need to change the upper boundary of your range. For this reason, you must specify the upper bound for the range.

You can approximate the average lifetime for database partition 1 as follows:
average lifetime = (20 x 1 + 30 x 3 + 80 x 6 + 10 x 12 + 5 x 24 + 3 x 48 + 2 x 96) / 150
                 = (20 + 90 + 480 + 120 + 120 + 144 + 192) / 150
                 = 1166 / 150
                 = 7.77 seconds
You can approximate the lifetime standard deviation as follows:
Standard deviation = [(20 x (1 - 7.77)2 + 30 x (3 - 7.77)2 + ... ) / 150]1/2

For partitioned database environments, averages and standard deviations can be computed by first computing a combined histogram across all database partitions by adding the counts of each bin across the database partitions.

For example, assume that the database has two partitions, the histogram bin sizes are as described above, and the histogram has the following data:
Database  Bin 1     Bin 2     Bin 3    Bin 4     Bin 5      Bin 6     Bin 7     Bin 8
partition count     count     count    count     count      count     count     count
        1    20        30        80       10         5          3         2         0
        2     1         5        20       20         4          0         0         0
Because the bin sizes are the same across all database partitions, the overall histogram is easy to compute:
Bin 1     Bin 2     Bin 3     Bin 4     Bin 5     Bin 6     Bin 7     Bin 8
count     count     count     count     count     count     count     count
   21        35       100        30         9         3         2         0
From the combined histogram, you can calculate the overall lifetime average and standard deviation in a similar way to how they were computed for a single-partition environment:
Average lifetime = (21 x 1 + 35 x 3 + 100 x 6 + 30 x 12 + 9 x 24 + 3 x 48 + 2 x 96) / 200
                 = (21 + 105 + 600 + 360 + 216 + 144 + 192) / 200
                 = 1638 / 200
                 = 8.19 seconds

Standard deviation = [(21 x (1 – 8.19)2 + 35 x (3 - 7.77)2 + ... ) / 200]1/2