Question & Answer
Question
How to interpret monitor data collected by db2mon CLP scripts?
Answer
db2mon is a set of CLP scripts that use Db2 lightweight in-memory 'mon_get' monitoring interfaces to collect monitoring data for a specific period. The mon_get interfaces return data that is incrementing from the time of database activation. To find activity during a particular period, each metric must be collected twice (at the beginning and at the end) and the difference calculated (using stored procedures).
Db2 11.1
Performance monitor data collection and reports using db2mon
db2mon is a set of CLP scripts that use Db2 lightweight in-memory 'mon_get' monitoring interfaces to collect monitoring data for a specific period. The mon_get interfaces return data that is incrementing from the time of database activation. To find activity during a particular period, each metric must be collected twice (at the beginning and at the end) and the difference is calculated (using stored procedures).
Note: It is advised that you collect data for a maximum of 300 seconds (5 minutes) to avoid wrapping some of the counters. For monitoring longer periods, collecting successive reports is preferred.
The following files are provided:
• db2mon.sh
• db2mon.sql (concatenated form of db2monBefore.sql, db2monInterval.sql, and db2monAfter.sql)
• db2mon_import.sql
• db2mon_export.sql
• db2mon_report.sql
Before running
The CLP scripts require a connection to the database. db2mon.sql also requires the database to have a user temporary tablespace. Many systems already have such a tablespace: if not, a new one can be created e.g.:
db2 create user temporary tablespace utemp
db2mon.sh creates and drops a user temporary tablespace.
Note: For db2mon.sql and db2mon_export.sql to properly collect monitoring data, monitoring must be enabled at the database configuration level. MON_ACT_METRICS and MON_REQ_METRICS must be set to at least BASE, which is the default value. MON_OBJ_METRICS defaults to EXTENDED which gives full monitor information on tables and indexes, and is ideal for db2mon.
Both the existence of a user temporary tablespace and the minimum required MON_x_METRICS settings are checked by db2mon itself in a prerequisites section at the top of the report.
Collecting and analyzing monitor data online
db2mon is run most commonly in online mode, where it runs to completion and produces a report on stdout on the system it is monitoring. Therefore, all the SQL to find the delta values in the metrics and produce the report runs will also be present in the report. db2mon.sh automatically connects to the database, creates a user temporary tablespace and a small private bufferpool (to minimize impact to the system), runs db2mon.sql, and then disconnects.
db2mon.sh MYDBNAME > db2mon.out
Many errors can be reported in db2mon.out when the script attempts to drop tables that don't yet exist. These errors can be ignored.
• Other types of errors may indicate that a user temporary tablespace doesn't exist, or that the script was generated for the wrong version of Db2.
• The report section begins after the text "STARTS", easily found by searching forward from the start of the output.
• The report can be wide so it is best to use a tool like less(1) or vim(1) where you can truncate lines if you like.
db2mon can be run online by using your current database connection, as follows.
db2 -tvf db2mon.sql > db2mon.out
Important: If you run on your current connection and interrupt db2mon.sql during execution (for example with Ctrl-C), CURRENT SCHEMA is likely to be set to SESSION, which could affect subsequent SQL executed on that connection. db2mon uses CURRENT SCHEMA to resolve its table references to the DGTTs used during online collection in db2mon.sql, or to the correct schema for permanent tables produced by db2mon_import.sql and db2mon_report.sql.
Using db2mon.sh also avoids any issue with CURRENT SCHEMA since it uses a separate, temporary connection.
Collecting and analyzing monitor data offline
Separate SQL files are provided for export, import and report generation. If the monitored system is performance sensitive, the performance data can be collected using the export script and transferred to another system for import and reporting.
Step 1
db2 –tvf db2mon_export.sql > db2mon_export.out
db2mon_export.sql exports the contents of all mon_get functions that db2mon uses to IXF files – twice – once for the ‘start’ data and once for the ‘end’. The overhead of this is low. These files can be transferred to another system for import into another Db2 database for analysis. Note: the ‘report’ can be run using any platform or version of Db2 – it does not have to match the level or architecture where the data was collected.
Step 2
On the analysis system, from the directory where the IXF files were restored / unpacked.
db2 –tvf db2mon_import.sql > db2mon_import.out
db2mon_import.sql uses the DB2 IMPORT utility to reconstitute the mon_get data back into tables in Db2 for analysis. IMPORT is used because the tables are created automatically (you don't have to reproduce a CREATE TABLE to match the source system's version of mon_get_workload, for example). IMPORT may be slower than LOAD but does not require the table definitions to be recorded separately.
The imported tables are permanent tables, not DGTTs. They are created in the current default schema and tablespace. The user must have the authority to create permanent tables and indexes. CURRENT SCHEMA can be set to a unique value to represent this set of monitoring data, allowing multiple data sets to stored.
Actual analysis and reporting are then done in the next and final step.
Step 3
db2 -tvf db2mon_report.sql > db2mon_report.out
The report that is generated by db2mon_report.sql is shorter than the output of db2mon.sql because it does not have all the DGTT declarations and delta calculations.
Note:Export provides ALL columns in almost all the monitoring table functions – not just the ones that are listed for querying in db2mon.pl. Exporting all columns can be helpful when reference to the original data is needed for analysis. Ad-hoc queries from the command line or altered versions of db2mon_report.sql can take advantage of the extra metrics when using the exported data set.
db2mon Report content
The report includes many useful queries, roughly grouped into sections.
• Point-in-time data (such as currently executing SQL/utilities, lock waits) are collected at the beginning and end of the interval.
• Cumulative data, measured over the whole interval.
– Database-level, tablespace-level, bufferpool-level, table-level, query-level, connection-level, and so on.
– Different deployment types - “regular” Db2 ESE, Db2 pureScale, Db2 with BLU, and so on.
– Environment information – database and instance configuration, registry variables, CPU counts and utilization, memory utilization
Examples of db2mon report output
1) Find the database throughput section near the beginning to identify work done by the system
======================================
Throughput metrics at database level
======================================
select min(ts_delta) ts_delta, member, decimal((sum(act_completed_total) / float(min(ts_delta))), 10, 1) as act
TS_DELTA MEMBER ACT_PER_S CMT_PER_S RB_PER_S DDLCK_PER_S SEL_P_S UID_P_S ROWS_INS_P_S
----------- ------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
35 0 22629.7 2361.1 0.0 0.0 13089.6 9540.0 4364.0
35 1 24331.0 2525.0 0.0 0.0 14064.1 10266.8 4638.2
35 2 27331.5 2842.1 0.0 0.0 15804.4 11527.1 5204.6
35 3 25674.2 2682.0 0.0 0.0 14859.5 10814.6 4878.8
4 record(s) selected.
The "Throughput metrics" report shows four members, each at about 2400 tx/s or 25,000 SQL/s, plus metrics, which are measured over 35 seconds. The process of collecting data can sometimes increase the planned interval (for example, 30 seconds in this case) slightly.
Abbreviated names are common in the report.
• ACT_PER_S: Activities per second
• CMT_PER_S: Commits per second
• RB_PER_S: Rollbacks per second
• DDLCK_PER_S: Deadlocks per second
• SEL_P_S: Select statements per second
• UID_P_S: Update/Insert/Delete statements per second
• ROWS_INS_P_S: Rows inserted per second
2) Analysis of processing time for the database.
Spotting high compile times (over 15%) on a transactional system can be a sign of an application that is not following the best practice of using parameter markers and is instead using literals.
======================================================
Time breakdown at database level (wait + processing)
======================================================
select member, integer(sum(total_rqst_time)) as total_rqst_tm, decimal(sum(total_compile_time) / float(sum
MEMBER TOTAL_RQST_TM PCT_COMPILE PCT_SECTION PCT_SORT PCT_COL PCT_COL_SYNOP PCT_COMMIT PCT_RBACK PCT_CONN
------ ------------- ----------- ----------- -------- ------- ------------- ---------- --------- --------
0 1035374 0.04 78.59 0.74 0.00 0.00 9.38 0.00 0.10
1 438810 0.01 75.85 1.02 0.00 0.00 15.15 0.00 0.12
2 492605 0.01 74.57 1.06 0.00 0.00 16.66 0.00 0.09
3 482646 0.01 76.32 1.04 0.00 0.00 14.73 0.00 0.11
4 record(s) selected.
Most time is spent executing sections (PCT_SECTION includes processing and wait times for SQL), with a chunk that is also spent in commit processing – which is considered normal.
3) Analysis of time spent waiting by the database. This report is often the most important top-level information on performance health.
==============================
Wait times at database level
==============================
select w.member, integer(sum(total_rqst_time)) as total_rqst_tm, integer(sum(total_wait_time)) as total_wait
MEMBER TOTAL_RQST_TM TOTAL_WAIT_TM PCT_RQST_WAIT PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_LG_DSK PCT_RCLM PCT_CF
------ ------------- ------------- ------------- -------- ------------ -------- ---------- -------- -------
0 1035374 732450 70.74 1.31 0.57 0.89 4.28 1.07 56.26
1 438810 291659 66.46 2.29 1.51 0.57 8.03 1.91 46.21
2 492605 320535 65.06 1.50 1.13 0.92 9.37 1.67 44.36
3 482646 319718 66.24 1.39 0.86 1.05 7.74 1.87 46.23
4 record(s) se1ected.
In this example, each member spends 70% of each request waiting (PCT_RQST_WAIT). The majority of this wait time is spent performing CF communication (PCT_CF), followed by log writes (PCT_LG_DST) and lock wait (PCT_LOCK).
Note: most of these report tables are wider than the space available – the report tables were trimmed down on the right for this document.
4) Identify the busiest statements.
Note: Various sections of the report shows statement data from the package cache – which means statement executions that finished execution during the monitoring interval. Statements still executing when monitoring finished are shown in the point-in-time section at the beginning of the report.
The top 100 statements by total activity time are listed. Different views are shown in tables, with focus on basic metrics, wait time, sorting, IO, etc.
Comparing the database-level wait time table with the information on the busiest statements can identify whether high wait times are caused by just a few statements or a system-wide issue affecting most/all statements. System-wide issues may be improved by configuration changes or changes in hardware. Statement performance may be improved by altering SQL, adding or removing indexes, refreshing statistics (e.g. RUNSTATS), etc.
======================================
Top SQL statements by execution time
======================================
select member, integer(num_exec_with_metrics) as num_exec, coord_stmt_exec_time, decimal(coord_stmt_exec_time / double(num_exec_with_metrics), 10, 2) as avg_coord_exec_time, decimal( (coord_stmt_exec_time / double(
MEMBER NUM_EXEC COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME AVG_CPU_TIME PCT_WAIT_TIME AVG_SECT_TIME AVG_COL_TIME STMT_TEXT
------ ----------- -------------------- ------------------- ------------------------ -------------------- -------------------- ------------- ------------- ------------ ----------------------------------------------
0 61729 301979 4.89 15.09 29928189 484 53.67 4.89 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
3 69026 150782 2.18 7.53 33255990 481 40.93 2.18 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
2 72832 148618 2.04 7.42 34689087 476 36.56 2.04 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
1 64562 137670 2.13 6.88 3121S907 483 41.59 2.13 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
0 124839 112563 0.90 5.62 4468845 35 89.65 0.90 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_
0 124833 111863 0.89 5.59 3686860 29 90.54 0.89 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
0 124836 58906 0.46 2.91 2807094 22 85.45 0.46 0.00 Insert into ORDER_LINE va1ues (?, ?, ?, ?, ?,
0 11350 57011 5.02 2.84 1997313 175 93.18 5.02 0.00 Update ORDER_LINE set OL_DELIVERY_D = ? where
2 147141 46613 0.31 2.33 4830165 32 84.75 0.31 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
3 138286 45568 0.32 2.27 4618735 33 85.28 0.32 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
3 138285 40548 0.29 2.02 5767918 41 83.84 0.29 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_
2 147137 39975 0.27 1.99 6031554 40 82.78 0.27 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_
1 131525 39818 0.30 1.99 4457211 33 86.20 0.30 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
==============================================================
Wait time breakdown for top SQL statements by execution time
==============================================================
select member, decimal((total_act_wait_time / double(total_act_time)) * 100, 5, 2) as pct_wait, decimal((log_disk_wait_time / double(total_act_time)) * 100, 5, 2) as pct_lg_dsk, decimal((log_buffer_wai
MEMBER PCT_WAIT PCT_LG_DSK PCT_LG_BUF PCT_LOCK PCT_GL_LK PCT_LTCH PCT_RCLM PCT_CF PCT_PFTCH PCT_DIAG PCT_POOL_RD PCT_DIR_WR PCT_DIR_RD STMT_TEXT
------ -------- ---------- ---------- -------- --------- -------- -------- ------- --------- -------- ----------- ---------- ---------- -----------------------------------------------------------------
0 53.67 0.00 0.00 2.37 0.36 2.34 0.00 47.55 0.00 0.00 1.40 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
3 40.93 0.00 0.00 2.29 1.02 2.57 0.00 30.97 0.00 0.00 5.06 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
2 36.56 0.00 0.00 1.42 0.65 1.88 0.00 29.37 0.00 0.00 3.87 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
1 41.59 0.00 0.00 5.04 3.21 1.05 0.00 31.96 0.00 0.00 3.52 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
0 89.65 0.16 0.00 0.00 0.00 0.07 0.35 89.06 0.00 0.00 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
0 90.54 0.00 0.00 2.32 2.29 0.00 3.59 75.30 0.00 0.00 9.27 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
0 85.45 4.61 0.00 0.00 0.00 0.71 3.77 75.19 0.00 0.00 0.49 0.00 0.00 Insert into ORDER_LINE va1ues (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
0 93.18 0.00 0.00 0.00 0.00 0.02 0.05 88.41 0.00 0.00 4.67 0.00 0.00 Update ORDER_LINE set OL_DELIVERY_D = ? where OL_W_ID = ? and OL_
2 84.75 0.00 0.00 5.16 5.12 0.00 4.94 63.97 0.00 0.00 10.32 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
3 85.28 0.00 0.00 0.23 0.19 0.00 7.03 67.87 0.00 0.00 10.12 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
3 83.84 0.19 0.00 0.00 0.00 0.01 1.50 82.12 0.00 0.00 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
2 82.78 0.20 0.00 0.00 0.00 0.01 0.71 81.85 0.00 0.00 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
1 86.20 0.00 0.00 0.00 0.33 0.00 4.85 70.23 0.00 0.00 10.76 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
1 83.34 0.22 0.00 0.00 0.00 0.00 0.90 82.20 0.00 0.00 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
Tip: The "Statement & Plan Identifiers" section shows the executable_id for each of the 100 statements, which you can use to query deeper into mon_get_pkg_cache_stmt, or get the explain plan.
db2 "call explain_from_section(x'<executable id>','M',NULL,0,'<current user>',?,?,?,?,?)"
"Top SQL statements by execution time, aggregated by PLANID" is similar to the "Top SQL statement by execution time". All statements that differ by literal values (and so have the same PLANID hash value) are aggregated to show the total costs. Useful when a transactional application is driving lots of SQL with literal values. Available in Db2 10.5 and later.
The following table shows buffer pool activity for the top statements.
==========================================================
IO statistics per stmt - top statements by execution time
==========================================================
select member, integer(num_exec_with_metrics) num_exec, decimal(pool_data_l_reads/double(num_exec_with_metrics), 16,1) as avg_d_lrd, decimal(pool_data_p_reads/double(num_exec_with_metrics), 10,1)
MEMBER NUM_EXEC AVG_D_LRD AVG_D_PRD AVG_I_LRD AVG_I_PRD AVG_TD_PRD AVG_TI_PRD AVG_COL_LRD AVG_COL_PRD AVG_DIR_R_RQS AVG_DIR_W_RQS STMT_TEXT
------ ----------- ------------------ ------------ ------------------ ------------ ------------ ------------ ------------------ ------------ ------------- ------------- --------------------------
0 61729 0.0 0.0 607.4 0.1 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_]
3 69026 0.0 0.0 607.4 0.4 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_]
2 72832 0.0 0.0 607.3 0.2 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_]
1 64562 0.0 0.0 607.3 0.2 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_]
0 124839 2.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY
0 124833 1.1 0.1 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0
0 124836 1.0 0.0 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu
0 11350 20.3 0.4 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Update ORDER_LINE set OL_DE
2 147141 1.1 0.1 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0
3 138286 1.1 0.1 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0
3 138285 2.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY 2 147137 2.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY
1 131525 1.1 0.1 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0
1 131523 2.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY
0 124837 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect I_NAME, I_PRICE, I_D
0 11343 0.0 0.0 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Se1ect sum(OL_AMOUNT) from
2 147135 1.0 0.0 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu
3 138285 1.0 0.0 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu 1 131522 1.0 0.0 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu
0 12533 1.0 0.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Insert into ORDERS va1ues (
2 13568 20.4 0.4 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Update ORDER_LINE set OL_DE
I/O information can indicate whether a statement is using an index or not. AVG_I_LRD is average index logical reads per execution. In this example, the first four entries (the same statement on each of four members) have high AVG_I_LRD values and zero AVG_D_LRD (average data logical reads) – implying an index-based plan. If an expensive statement has high data logical reads and low index logical reads, changes to the indexes may improve performance.
5) Examining Long Disk Performance
One or a few very busy statements may suggest a missing index or a bad plan. If that does not provide a solution, compare the highest wait times from the database and statement levels. This information should identify the largest contributors to wait time and then consult the appropriate report sections to provide more detailed information.
In db2mon, the individual wait time percentages (log disk wait time, for example) are calculated as a percent of total request time, not as a percent of total wait time. The latter approach (where everything adds up to 100% wait time) is what monreport.dbsummary does.
A high percentage of log disk wait time (PCT_LG_DSK in db2mon output) is common for transactional applications.:
==========================
Database log write times
==========================
select member, num_log_write_io, case when ts_delta > 0 then decimal( double(num_log_write_io) / ts_delta, 10, 4 ) else null end as
MEMBER NUM_LOG_WRITE_IO LOG_WRITE_IO_PER_S LOG_WRITE_MB_PER_S LOG_WRITE_TIME LOG_WRITE_TIME_PER_IO_MS NUM_LOG_BUFFER_FULL
------ -------------------- ------------------ ------------------ -------------------- ------------------------ --------------------
0 57237 1788.6562 7.0000 25306 0.4421 0
1 68286 2133.9375 8.3437 26345 0.3858 0
2 69373 2167.9062 8.5312 26669 0.3894 0
3 69286 2165.1875 8.4687 26741 0.3859 0
4 record(s) selected.
In this example, log write time per I/O is 0.4 milliseconds and the percentage of log disk wait time is low. "Good" values will vary from system to system, but higher Log Wait percentage and write times above 4 milliseconds may suggest changes in storage configuration (cache, RAID, number of LUNs per filesystem, etc).
6) Buffer pool read time
High percentage bufferpool read time (PCT_POOL_R) is a common bottleneck, especially when the database is warming up (e.g. shortly after database activation or when a new application starts making connections). Average pool read times (AVG_READ_TIME) are reported at bufferpool and tablespace levels. Similarly, a high percentage for direct I/O time (for Large Objects) can be tracked in AVG_DRCT_READ_TIME and AVG_DRCT_WRITE_TIME.
Note: the abbreviations here include:
• POOL_DATA_L_READS: bufferpool data page logical reads (i.e. base table ORGANIZE BY ROW access)
• POOL_DATA_P_READS: bufferpool data page physical reads
• POOL_INDEX_L_READS: bufferpool index page logical reads (i.e. index access)
• POOL_INDEX_P_READS: bufferpool index page physical reads
• POOL_COL_L_READS: bufferpool column-organized data page logical reads (i.e. base table ORGANIZED BY COLUMN access)
• POOL_COL_P_READS: bufferpool column-organized data page physical reads
============================
Bufferpool read statistics
============================
select member, substr(bp_name,1,20) as bp_name, pool_data_l_reads, pool_data_p_reads, pool_index_l_reads, pool_index_p_reads, pool_col_l_reads, pool_col_p_reads, pool_read_time, case when
MEMBER BP_NAME POOL_DATA_L_READS POOL_DATA_P_READS POOL_INDEX_L_READS POOL_INDEX_P_READS POOL_COL_L_READS POOL_COL_P_READS POOL_READ_TIME AVG_READ_TIME
------ -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------
2 IBMDEFAULTBP 3037899 44159 45043859 27427 0 0 19619 0.27
3 IBMDEFAULTBP 2841993 41020 42586533 34728 0 0 21063 0.27
1 IBMDEFAULTBP 2696917 38268 40018451 23941 0 0 17030 0.27
0 IBMDEFAULTBP 2518978 38610 37512769 10298 0 0 28304 0.57
4 record(s) selected.
===============================
Disk read and write I/O times
===============================
select member, substr(tbsp_name,1,20) as tbsp_name, (pool_data_p_reads + pool_index_p_reads+ pool_col_p_reads) as num_reads, case when ((pool_data_p_reads + pool_index_p_reads+ pool
MEMBER TBSP_NAME NUM_READS AVG_READ_TIME DIRECT_READ_REQS AVG_DRCT_READ_TIME NUM_WRITES AVG_WRITE_TIME DIRECT_WRITE_REQS AVG_DRCT_WRITE_TIME
------ -------------------- -------------------- ------------- -------------------- ------------------ -------------------- -------------- -------------------- -------------------
3 TBS_OM 29235 0.27 0 - 0 - 0 -
2 TBS_OM 22195 0.26 0 - 0 - 0 -
2 TBS_CST 21312 0.27 0 - 0 - 0 -
3 TBS_CST 19700 0.28 0 - 0 - 0 -
1 TBS_OM 19201 0.26 0 - 0 - 0 -
1 TBS_CST 18485 0.27 0 - 0 - 0 -
0 TBS_CST 17705 0.56 0 - 0 - 0 -
2 TBS_STK 16568 0.27 0 - 0 - 0 -
0 TBS_STK 16127 0.59 0 - 0 - 0 -
3 TBS_STK 15559 0.27 0 - 0 - 0 -
7) Round-trip time for messages between the members and CFs
For reports that show high percentage CF wait time (PCT_CF), the following report shows performance data on messages exchanged between the Db2 member and the CFs.
===================================================================
Round-trip CF command execution counts and average response times ===================================================================
select member, id, cast(substr(cf_cmd_name,1,30) as varchar(30)) as cf_cmd_name, total_cf_requests, decimal( doub
MEMBER ID CF_CMD_NAME TOTAL_CF_REQUESTS AVG_CF_REQUEST_TIME_MICRO
------ ------ ------------------------------ -------------------- -------------------------
2 128 SetLockState 1016554 60.11
3 128 SetLockState 955295 65.82
1 128 SetLockState 905310 62.04
0 128 SetLockState 851689 223.05
3 128 ReadAndRegister 682169 88._8
2 128 ReadAndRegister 671832 83.39
1 128 ReadAndRegister 662450 84.51
0 128 ReadAndRegister 627922 282.94
2 128 SetLockStateMultiple 413551 70.50
3 128 SetLockStateMultiple 387168 77.93
1 128 SetLockStateMultiple 367427 73.74
0 128 SetLockStateMultiple 345954 252.11
8) Reclaims of Index and Data pages
If reclaim wait time (PCT_RCLM) is above zero, the following data shows which table has the most reclaims, and whether the table or an index is being reclaimed:
===============================================
Page reclaim metrics for index and data pages
===============================================
select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname, substr(objtype,1,10) as objtype, data_partition_id, iid, (page
MEMBER TABSCHEMA TABNAME OBJTYPE DATA_PARTITION_ID IID PAGE_RECLAIMS RECLAIM_WAIT_TIME
------ -------------------- ---------------------------------------- ---------- ----------------- ------ -------------------- --------------------
0 DTW ORDERS INDEX - - 4334 3357
2 DTW ORDER_LINE INDEX - - 3870 3096
2 DTW ORDERS INDEX - - 3652 2732
3 DTW ORDERS INDEX - - 3638 2724
1 DTW ORDER_LINE INDEX - - 3634 2666
3 DTW ORDER_LINE INDEX - - 3577 2391
1 DTW ORDERS INDEX - - 3502 2326
0 DTW ORDER_LINE INDEX - - 4169 2286
0 DTW STOCK_501_750 INDEX - - 165 983
0 DTW STOCK_751_1000 INDEX - - 199 952
1 DTW NEW_ORDER INDEX - - 441 880
This example shows highest reclaim activity on indexes on ORDERS, and reclaim wait time is low at the database level. High reclaims of index pages are the most common, and can be improved by using RANDOM indexes, CURRENT MEMBER partitioning, range partitioning or other approaches.
9) Reclaims of SMP pages
The following example shows reclaims of SMP pages, which are often due to heavy inserts into a table with a too-small tablespace extent size.
====================================
Page reclaim metrics for SMP pages
====================================
select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname, substr(objtype,1,10) as objtype, data_partition_id, iid, (spacemapp
MEMBER TABSCHEMA TABNAME OBJTYPE DATA_PARTITION_ID IID SMP_PAGE_RECLAIMS SMP_PAGE_RECLAIM_WAIT_TIME
------ -------------------- ---------------------------------------- ---------- ----------------- ------ -------------------- --------------------------
3 DTW ORDER_LINE TABLE - - 9 254
2 DTW ORDER_LINE INDEX - - 57 241
0 DTW ORDER_LINE INDEX - - 90 192
0 DTW ORDER_LINE TABLE - - 9 170
3 DTW ORDER_LINE INDEX - - 67 169
1 DTW ORDER_LINE INDEX - - 57 76
1 DTW ORDER_LINE TABLE - - 3 10
2 DTW ORDER_LINE TABLE - - 4 3
3 DTW NEW_ORDER TABLE - - 2 3
1 DTW NEW_ORDER TABLE - - 1 2
10 record(s) se1ected.
10) Latch wait time
A high percentage (more than 15-20%) of latch wait time (PCT_LTCH) can result from various causes. The following example shows latch waits by type.
==================== Latch wait metrics ====================
select member, substr(latch_name,1,60) as latch_name, total_extended_latch_wait_time as tot_ext_latch_wait_time_ms, total_extended_latch_waits
MEMBER LATCH_NAME TOT_EXT_LATCH_WAIT_TIME_MS TOT_EXT_LATCH_WAITS TIME_PER_LATCH_WAIT_MS
------ ------------------------------------------------------------ -------------------------- -------------------- ----------------------
0 SQLO_LT_SQLB_BPD_bpdLatch_SX 7182 4454 1.61
3 SQLO_LT_SQLB_BPD_bpdLatch_SX 4085 1965 2.07
2 SQLO_LT_SQLB_BPD_bpdLatch_SX 3086 2088 1.47
1 SQLO_LT_SQLB_BPD_bpdLatch_SX 1764 1724 1.02
0 SQLO_LT_SQLB_BPD_WARLatch 1272 1783 0.71
2 SQLO_LT_SQLB_BPD_WARLatch 1168 2637 0.44
3 SQLO_LT_SQLB_BPD_WARLatch 706 2110 0.33
1 SQLO_LT_SQLB_BPD_WARLatch 509 1870 0.27
0 SQLO_LT_SQLP_LHSH_hshlatch 263 758 0.34
3 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 240 36 6.66
2 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 221 45 4.91
1 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 193 21 9.19
0 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 175 25 7.00
This example shows no significant latching issue. For scenarios with more significant latch waits:
• In Db2 pureScale, there is a strong correlation between latch wait time and page reclaim time. Even 2-5% of reclaim wait time can end up causing 20-40% of latch wait time in some scenarios. Examining the reclaims report may indicate the table or index which needs alteration, especially if both latch and reclaim wait time are high on the same SQL statements.
• Significant improvements in latching were made in Db2 v11.1. If the hottest latch looks like SQLB_HASH_BUCKET_GROUP_HEADER then upgrading to v11 may reduce the time spent holding this latch.
• The SQLB_BPD__bpdLatch type is associated with a (some) hot buffer pool page. Identify the table involved by looking at statements with high latch wait time. One approach to reducing latch contention is range partitioning the table and using partitioned indexes.
Was this topic helpful?
Document Information
Modified date:
16 June 2018
UID
swg22014557