IBM Support

What data should I collect for a PERFORMANCE issue with DB2 For Linux, Unix and Windows?

Question & Answer


Question

If I am experrincing a PERFORMANCE issue with my DB2 for Linux, Unix and Windows server, what data does DB2 support need me to collect?

Answer

Disclaimer - DB2 Universal Database (all versions) is a very complex product with many different components interacting with one another. This being the case, it is not possible to outline all the diagnostic data collections that may be needed to resolve any/all problems that may arise. Each issue with DB2 is unique, and in most cases, there is an initial round of data that will provide a good overall view of all aspects of DB2's behavior in a given circumstance. While this may be enough to identify root cause, it is not uncommon for this initial round of data to indicate a problem in a specific area, and that a second round of data is required to dig deeper into that.

In addition to the diagnostic data outlined below, DB2 support will need to have a detailed description of the problem. They will ask many questions to get an overall picture of the problem seen. During this time, additional diagnostic data may be requested. Also, after reviewing the initial set of data, additional diagnostics may be required.

Due to this fact, it would be best if a PMR could be opened before collecting data, so the support analyst can get some details about the issue, and then perhaps tailor the initial data collection so as to minimize the chances of additional collections being required. However there still is no guarantee that additional data collections will not be required.

Must-gather for PERFORMANCE issue data collection:

Performance issues are defined as issues where DB2 is not performing as well as previously, or as expected. Performance issues fall into 2 categories. The first is performance issues with particular SQL statement. This would be where everything else is fine, but performance is bad for this particular statement only. The second is that overall performance on the system is bad. This would be where any queries/jobs submitted are performing poorly. Each requires it's own set of data.

Single SQL performing poorly

In this case, you would simply need to gather the information outlined in the "Collecting Data for DB2 Compiler Issues" Technote that can be found here:

http://www-01.ibm.com/support/docview.wss?rs=71&uid=swg21106550

Please gather the information as outlined under Method A.

Poor performance overall

These are the most difficult to come up with a solution because there are so many components that could contribute to a performance issue. In this case, again, if at all possible, a PMR should be opened prior to gathering data so that a support analyst can discuss the symptoms and come up with a plan as to what data to collect. However there are a few that will most likely be needed. Please note that this data should be gathered when the bad performance is happening. If it is not, then the data will not be useful.

If a performance issue is observed, then you should (as the instance owner) log in to the DB2 server itself, and execute the following command:

db2fodc -performance full

This command may take some time to complete, but it will gather information crucial to the root cause analysis. If there is more than one PHYSICAL node for this instance, please execute the command on each PHYSICAL node that is having the performance issue (IE if the performance issue is only on certain nodes).

It will produce a directory called "FODC_Performance_XXXXXXX" in the db2dump (DIAGPATH) of the DB2 server/node, where XXXXXXX is the timestamp that the command was executed.

Once the command completes, the FODC information can be gathered using the db2support command as follows:

If the issue is specific to one database:

db2support . -d <dbname> -c -s

If for the whole instance:

db2support . -s

Again, if there is more than one PHYSICAL node, then the command should be executed on all PHYSICAL nodes where the performance issue is happening.

Here is some additional data that can be gathered under more specific scenarios.

Overall slowdown (IE everything is slow and is happening for long durations -- say in the morning it slows down for 2 hours and then again in the afternoon etc. ).

For NON-DPF installations

I would suggest using getsnaps data collection here (external script not provided with DB2, provided separately).

Create a script called "collect.ksh" that contains the following:

/**************************/
DBNAME=$1
tstamp=`date "+%Y%m%d_%H%M%S"`
db2 connect to $DBNAME
db2 "call monreport.dbsummary(60)" > dbsummary.txt.$tstamp
db2 "select * sysibmadm.bp_read_io" > bp_read_io.txt.$tstamp
db2 "select * from sysibmadm.bp_write_io" > bp_write_io.txt.$tstamp
db2 "call monreport.currentsql" > currentsql.txt.$tstamp
db2 terminate
/*************************/

Then execute getsnaps as follows:

getsnaps -database <dbname> -db -dbm -appl -everything -edus interval=5 -vmstat -iostat -s "./collect.ksh <dbname>" -db2trc 30 -period 300 -max 6

Zip up the files produced in the working directory and send in to the PMR.

For DPF installations

Run it on one co-ord node + at least 2 data nodes (pick any 2).

getsnaps -database <dbname> -db -dbm -appl -everything -edus interval=5 -node global -local -vmstat -iostat -s "./collect.ksh <dbname>" -db2trc 30 -period 300 -max 6

/**************************/

DBNAME=$1

PLAT=`uname -s`

##OS info
if [ "x$PLAT" = "xAIX" ];then
rah "iostat -RDTVl 1 15" > iostat_RDTVl.txt.`date "+%Y%m%d_%H%M%S"`
rah "iostat -T 1 15" > iostat_T.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; svmon -G" > svmon_G.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; ps -kelf" > ps_kelf.txt.`date "+%Y%m%d_%H%M%S"`
rah "vmstat -P all -w -t 1 10" > vmstat_P.txt.`date "+%Y%m%d_%H%M%S"`
rah "vmstat -w -t 1 10" > vmstat.txt.`date "+%Y%m%d_%H%M%S"`
elif [ "x$PLAT" = "xLinux" ];then
rah "vmstat 1 10" > vmstat.txt.`date "+%Y%m%d_%H%MS"`
rah "iostat -xt 1 15" > iostat_RDTVl.txt.`date "+%Y%m%d_%H%M%S"`
rah "iostat 1 15" > iostat.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; ps -elf" > ps_kelf.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; cat /proc/meminfo" > meminfo.txt.`date "+%Y%m%d_%H%M%S"`
fi

#Collect some Queue data, snapshot data

db2 connect to $DBNAME > /dev/null 2>&1
db2 "select snapshot_timestamp as timestamp,dbpartitionnum,count(*) as count from table(sysproc.snap_get_subsection(null,-2)) as t where ss_status = 'EXEC' GROup by snapshot_timestamp,dbpartitionnum having count(*) > 0 order by count desc" > partititionBottleneck.txt.`date "+%Y%m%d_%H%M%S"`

db2 "select snapshot_timestamp as timestamp,agent_id, rows_read, rows_written from table( snap_get_appl( null, -2)) as T order by rows_read desc fetch first 5 rows only" > appsRowsRead.txt.`date "+%Y%m%d_%H%M%S"`

db2 "select snapshot_timestamp as timestamp, agent_id, (pool_data_p_reads+pool_index_p_reads) as physical_reads, (pool_data_writes + pool_index_writes) as physical_writes from table( snap_get_appl( null, -2)) as T order by physical_reads desc fetch first 5 rows only" > apssPhysicalReads.txt.`date "+%Y%m%d_%H%M%S"`

db2 "select varchar(event_object,30) as event_object, varchar(event_object_name,45) as event_object_name, count(*) as count from table(WLM_GET_SERVICE_CLASS_AGENTS_V97(NULL,NULL,NULL,-2)) group by event_object, event_object_name" > WLMStats.txt.`date "+%Y%m%d_%H%M%S"`

db2 "SELECT snapshot_timestamp as timestamp,SUBSTR(BP_NAME, 1, 15) AS BP_NAME, TOTAL_PHYSICAL_READS, AVERAGE_READ_TIME_MS, DBPARTITIONNUM FROM SYSIBMADM.BP_READ_IO ORDER BY DBPARTITIONNUM" > BPReadIO.txt.`date "+%Y%m%d_%H%M%S"`

db2 "SELECT SUBSTR(BP_NAME, 1, 15) AS BP_NAME, TOTAL_WRITES, AVERAGE_WRITE_TIME_MS, DBPARTITIONNUM FROM SYSIBMADM.BP_WRITE_IO ORDER BY DBPARTITIONNUM" > BPWriteIO.txt.`date "+%Y%m%d_%H%M%S"`

db2 terminate > /dev/null 2>&1
/**************************/

Zip up the files produced in the working directory and send in to the PMR.


High CPU issue

Try and get OS info to find out what is happening to the CPU profile i.e. whether %usr or %sys spike.

db2pd -latches -edus interval=5 top=5
db2pd -db <dbname> -active -apinfo all

db2pd -dbptnmem -memsets -mempools -db <dbname> -inst

db2 "SELECT varchar(memory_set_type, 20) as set_type, varchar(db_name, 20) as dbname, memory_set_used, memory_set_used_hwm, memory_set_committed FROM TABLE(
MON_GET_MEMORY_SET(NULL, CURRENT_SERVER, -2))" > memset.out

db2 "SELECT varchar(memory_set_type, 20) AS set_type, varchar(memory_pool_type,20) AS pool_type,
varchar(db_name, 20) AS dbname, memory_pool_used, memory_pool_used_hwm FROM TABLE(
MON_GET_MEMORY_POOL(NULL, CURRENT_SERVER, -2)) " > mempool.out

On AIX, also get

tprof -skjuevl -x 'sleep 30'

Zip up the files produced in the working directory and send in to the PMR.


If it's a utility issue ( LOAD, Backup, Restore, import, export, Reorg, Runstats )

Identify the Application Handle for the suspect job. Then get an apphandle trace and stacks for the apphandle. In order to identify the apphandle for load use "db2pd -db <dbname> -load".


db2trc on -l 512m -apphdl <apphdl> -t /*Note that is a lower case L before 512 */
db2pd -stack apphdl=<apphdl> dumpdir=`pwd` -rep 2 3
db2trc dump trace.1.dmp
sleep 30
db2pd -stack apphdl=<apphdl> dumpdir=`pwd` -rep 2 3
db2trc dump trace.2.dmp
sleep 30
db2pd -stack apphdl=<apphdl> dumpdir=`pwd` -rep 2 3
db2trc dump trace.3.dmp
sleep 30
db2trc off

Then format the traces:

db2trc fmt trace.1.dmp trace.1.fmt
db2trc flw trace.1.dmp trace.1.flwt -t
db2trc fmt trace.2.dmp trace.2.fmt
db2trc flw trace.2.dmp trace.2.flwt -t
db2trc fmt trace.3.dmp trace.3.fmt
db2trc flw trace.3.dmp trace.3.flwt -t

Zip up the trace files produced in the working directory and send in to the PMR.

If its a slow HADR issue

On the Primary and Standby:
db2pd -db <dbname> -hadr -rep 30 >> db2pd.hadr.out
db2pd -db <dbname> -appl -active -apinfo all -wlocks -rep 5 >> db2pd.applocks.out
db2pd -stack all dumpdir=`pwd` rep 30 4


On Standby:
db2pd -db <dbname> -dpsprcb -rep 60 5 >> db2pd.sprcb.out
db2pd -db <dbname> -dpsdbcb -rep 60 5 >> db2pd.sdbcb.out

Zip up the files produced (including stacks) in the working directory and send in to the PMR.


If its a paging issue

For NON-DPF installations

Run it on the single node.

For DPF installations

Run it on one co-ord node + at least 2 data nodes (pick any 2). Chances are very high the paging node is inaccessible so use some other data node.

Create a script called "collect.ksh" that contains the following:

/*************************/
DBNAME=$1
HOST=`hostname -s`
PLAT=`uname -s`

##OS info
if [ "x$PLAT" = "xAIX" ];then
rah "iostat -RDTVl 1 15" > iostat_RDTVl.txt.`date "+%Y%m%d_%H%M%S"`
rah "iostat -T 1 15" > iostat_T.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; svmon -G" > svmon_G.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; ps -kelf" > ps_kelf.txt.`date "+%Y%m%d_%H%M%S"`
rah "vmstat -P all -w -t 1 10" > vmstat_P.txt.`date "+%Y%m%d_%H%M%S"`
rah "vmstat -w -t 1 10" > vmstat.txt.`date "+%Y%m%d_%H%M%S"`
elif [ "x$PLAT" = "xLinux" ];then
rah "vmstat 1 10" > vmstat.txt.`date "+%Y%m%d_%H%MS"`
rah "iostat -xt 1 15" > iostat_RDTVl.txt.`date "+%Y%m%d_%H%M%S"`
rah "iostat 1 15" > iostat.txt.`date "+%Y%m%d_%H%M%S"`
rah "uptime; ps -elf" > ps_kelf.txt.`date "+%Y%m%d_%H%M%S"`
fi

tstamp=`date "+%Y%m%d_%H%M%S"`
db2 connect to $DBNAME

db2 "select current timestamp as snap_time, substr(host_name,1,20) as host, substr(db_name,1,15) as dbname, member, memory_set_type, memory_pool_type, memory_pool_id, application_handle, edu_id, memory_pool_used, memory_pool_used_hwm from table(mon_get_memory_pool(null,null,null)) where memory_pool_type like '%SORT%'" > sortheapConsumption.$HOST.$tstamp

db2 "select current timestamp as snap_time, substr(db_name,1,8) as DATABASE, substr(memory_set_type,1,12) as MEMORY_SET, member, MEMORY_SET_USED, MEMORY_SET_COMMITTED, MEMORY_SET_USED_HWM from table ( MON_GET_MEMORY_SET (null,null,null) )" > memsetInfo.$HOST.$tstamp

db2 "select current timestamp as snap_time, substr(host_name,1,20) as host, substr(db_name,1,15) as dbname, member, memory_set_type, memory_pool_type, memory_pool_id, application_handle, edu_id, memory_pool_used, memory_pool_used_hwm from table(mon_get_memory_pool(null,null,null)) " > mempoolInfo.$HOST.$tstamp

db2 "select current timestamp as snap_time, member, sum(memory_pool_used) as private_sort_used from table(mon_get_memory_pool('PRIVATE', '$DBNAME', null)) where memory_pool_type = 'SORT' group by 1,member" > totalSortHeapConsumption.$HOST.$tstamp

db2 "select current timestamp as snap_time, substr(service_superclass_name, 1, 30), substr(service_Subclass_name, 1, 30), t.member, q.APPLICATION_HANDLE , sum(memory_pool_used) as private_sort_used from table(mon_get_memory_pool('PRIVATE', '$DBNAME', NULL)) as t, table(wlm_get_service_class_agents(NULL,NULL,NULL,NULL)) as q where t.edu_id = q.agent_tid and t.member = q.dbpartitionnum and t.memory_pool_type = 'SORT' group by 1,service_superclass_name, service_subclass_name, t.member, q.APPLICATION_HANDLE order by 6 desc fetch first 10 rows only" > top10SortConsumers.$HOST.$tstamp

db2 terminate
/*********************************/
getsnaps -database <dbname> -db -dbm -app -age -active -apinfo all -memsets -mempools -dbptnmem -node global -local -vmstat -iostat -period 60 -max 5 -s "./collect.ksh <dbname>"

Zip up the files produced in the working directory and send in to the PMR.

Sending in the data:

The collected data should be uploaded to the associated PMR. If there is no PMR open for the issue yet, then please open one first. Then once you have the PMR number, there are a couple of ways to upload the data.

Before uploading, please ensure that the data is named appropriately. If DB2 support has asked for data to be named a certain way, then please do so. If they have not, then please come up with a good naming scheme. For example, if you have multiple db2support.zip files from multiple machines (IE DPF environment with multiple physical nodes), then you can name them like:

db2support.<hostname>.zip

1. You can use a browser to upload the data. Go to

http://www.ecurep.ibm.com/app/upload

Then follow the directions there.

2. Command line FTP. You can go here:

http://www.ibm.com/support/pages/node/739283

For directions to upload data. Please be sure to follow the directions under "Prepare your data" to ensure that the upload and attachment to the PMR is successful, or there may be delays in support gaining access to it.

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Performance - General\/Tuning","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.8;9.7;10.1;10.5","Edition":"Advanced Enterprise Server;Advanced Workgroup Server;Enterprise Server;Workgroup Server","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21700589