Event and real-time database and system monitoring in a DB2 pureScale environment

In addition to viewing the overall status of the components of a DB2 pureScale instance, you can examine specific aspects of the operation of cluster caching facilities and members using the DB2 monitoring infrastructure. You can use monitoring table functions and administrative views to display this information. You can also use selected event monitors to capture events as they occur.

DB2 V9.7 introduced a number of enhancements to the monitoring infrastructure for the DB2 product. One of these enhancements was a set of table functions that provide access to hundreds of in-memory monitor elements that you can use query the state of your database environment at a specific point in time. Other enhancements included improved event monitors for capturing information about such things as locking, units of work, and activities as they occur.

The DB2 pureScale Feature extends the monitoring capabilities built into the DB2 database with monitor elements that you can use to view data that describes specific aspects of the operation of cluster caching facilities (also known as CFs) and members in a DB2 pureScale instance. However, there are some differences between monitoring in DB2 pureScale instances and other DB2 instances to be aware of, including:

The ability to monitor CFs in addition to DB2 members

CFs, with the different role they play as compared to members in a DB2 pureScale environment introduce additional monitoring needs. For example, in DB2 instances other than DB2 pureScale instances, you might be interested in monitoring for buffer pool hit ratios, which represents the number of pages that are found in memory, as compared to the number of pages that must be read from disk. Higher buffer pool hit ratios are, generally speaking, a reflection of better performance. The higher performance is because there is less I/O involved in bringing needed pages into memory. In a DB2 pureScale environment, all physical page reads from disk are performed by the members, but only after they first check with the CF to see if the group buffer pool has a record of any other member with a valid page that they can use. Thus, whereas you might be accustomed to tuning only local buffer pools in a DB2 environment other than a DB2 pureScale environment, monitoring buffer pool hit ratios in the group buffer pool of the CF is also important in a DB2 pureScale environment. The more times pages can be found in either a local or group buffer pool (GBP), the fewer times they must be read in from disk.

In addition to the GBP, the global lock manager (GLM) is another component of the CF that you can monitor. The GLM manages locking of objects across all the members in a DB2 pureScale instance. The DB2 pureScale Feature adds monitor elements that you can use to monitor locking between members.

How monitor elements in a DB2 pureScale instance are reported

In general, the mechanics of monitoring in a DB2 pureScale instance are similar to the mechanics of monitoring in other DB2 instances. For example, the MON_GET_TABLESPACE table function, which returns information about table spaces in a database, works similarly in both DB2 pureScale instance and other DB2 instances. In a DB2 pureScale instance, the scope of some monitor elements is limited to a specific member, while the scope of others is global, across all members. For example, the data from monitor elements such as direct_reads, or pool_data_l_reads are specific to read activity performed by a member. By comparison, monitor elements such as tbsp_total_pages, which represent physical attributes of a table space is the same across all members, because all members share the same table space. For example, consider the following query:
SELECT VARCHAR(TBSP_NAME, 30) AS TBSP_NAME,  
       MEMBER, POOL_DATA_L_READS, 
       TBSP_TOTAL_PAGES  
FROM TABLE(MON_GET_TABLESPACE('USERSPACE1',-2))

The results of this query look like the following example:

TBSP_NAME                      MEMBER POOL_DATA_L_READS    TBSP_TOTAL_PAGES
------------------------------ ------ -------------------- --------------------
USERSPACE1                          1                    0                 4096
USERSPACE1                          2                    0                 4096
USERSPACE1                          3                    0                 4096
USERSPACE1                          0                   36                 4096

  4 record(s) selected.
In this example, the number of logical reads from the local buffer pool for each member is different because each member performs its reads independently of other members; however the total pages for the table space is the same across all members, because all members are working from the same instance of USERSPACE1.

Effects of component failure on monitor element reporting

If a host, member or CF in a DB2 pureScale environment fails, unless the entire DB2 pureScale instance is taken down, you can still retrieve monitor elements from the instance. However, the components that fail do not generate statistics. This fact is apparent if you are running a query such as the first example shown in How monitor elements in a DB2 pureScale instance are reported, where data from each member is shown individually. If you use a query that aggregates information across members, though, you might not notice that data from a member is missing.

Another thing to keep in mind is that if a member fails while monitor element data collection is taking place, the data collection process pauses until the communications problem with the failed member has been detected, or the TCP/IP timeout period has passed. In this situation, the data is still reported, however, there is no information from the failed member.

Finally, keep in mind that if a member fails, all the statistics accumulated in the monitor elements are reset to 0.