DB2 recommended APARs and tuning for DB2 INSERT performance This information apar is opened for recommended APARs and tuning to improve DB2 INSERT performance based on recent observations from analysis and solution to customers using SAP application. Those recommendation should also apply to general customers. ---------------------------------------------------------------- Recommended APARs: PM91835 condition: UTS and single table classic segmented non Member Cluster tablespace with either versioned alters or dictionary build by insert symptom: loop, excessive getpages during insert problem: When searching the available data page within a space map page, if it reaches the allowable locking failure threshold or page false lead threshold, the current space map page is skipped to enhance the space search performance. When determining the next space map page to search the last physical page is used to find the next anchor of the segment. However, if the last page of the current space map page is a system page then it is incorrectly used to determine the anchor linkage of the next segment. As the result, looping through visiting the same segment within the space map page may occur. This enhancements only applies to the classic segmented table space with a single table or a Universal table space. . PM93688 condition: UTS with Member Cluster symptom: excessive getpages problem: During the space search process of INSERT operation, a relative segment was used instead of the absolute segment number when locating the space map page covered by the target segment to search for space. As the result, the same space map page was searched repeatedly and caused additional get page counts. . PM97867 condition: UTS with insert and delete activity symptom: excessive getpages problem: When deleted space distributes through the table space, the subsequent insert process could result in high get page count. This is due to the false lead threshold that was increased to improve space reuse within the candidate segment or during the exhaustive search prior to physical extend. However, the extra page search does not fit for the situation where the data page has unusable free space. As a result, the get page count would increase by visiting more pages for each . PM99792 condition: UTS with insert & delete activity, Page Level Locking (PLL) and long running uncommitted UR in >= V10 NFM symptom: excessive getpages problem: Under V10 or V11 NFM with heavy concurrency of insert and delete operations on a UTS table space, insert performance can be severely degraded if there exists a long running UR without performed a single commit. This long-running UR can affect the INSERT performance even if accesses other tables only. This long running UR could delay the advance of commit lsn and therefore make all of deleted and committed space become unavailable for reuse. . PI04942 condition: UTS non-Member Cluster 256G DSSIZE with > 214G data, and Member Cluster with > 5.1G data symptom: severe insert performance degradation. possible DSNISGSC:5001 problem: When a table contains larger quantity of data rows on UTS, DB2 mistakenly calculated a wrong segment number due to the use of smaller integer fields. This wrong segment calculation would degrade the overall insert performance. . PI06926 PI12398 condition: UTS symptom: excessive getpages problem: Slow performance in consecutive INSERT including Multiple Row INSERT in the same commit scope because the DB2 tries to reuse space efficiently by visiting same set of possible available deleted space. However, the uncommitted delete space prevent space reuse and cause long space search and further degrade performance. . PI07513 condition: GBP Dependent objects symptom: Possible high other write I/O for insert or other sql update problem: Over-detection of the Vertical Deferred Write Queue Threshold(VDWQT) caused by the presence of clean pages on the Virtical Deferred Write Queue(VDWQ). VDWQT writes are triggered based on the count of pages on the VDWQ, dirty of not. It's always been this way, and there's always been a possibility of clean pages on the VDWQ, but it became much more likely in V10 with Member Cluster. Possible high other write I/O for insert or other SQL update the page when the object is GBP-dependent. In some case, OTHER WRITE I/O is possibly reported as OTHER READ I/O incorrectly. . PI11661 condition: particularly for LOBs but possibly for other tables as well symptom: excessive getpages and high cpu utilization problem: The ability to reuse deleted space for LOBs, or for other tables using static scrollable cursors (or sharing a buffer pool with LOBs or tables using static scrollable cursors), is limited by the value of the global read-LSN. For the situation where read claims are continuously coming and going, but there is never a point in time where no read claims are held at all, DB2 may fail to advance the global read-LSN value for long periods of time even though the local table-level and buffer pool values are moving forward. . PI12709 condition: any tablespace with considerable insert/delete workload sharing a bufferpool in Data Sharing environment with UTS that has LOB or XML if there are long running read transactions against any object in the bufferpool. symptom: excessive locks and getpages per insert if PLL locking, excessive getpages per insert if Row Level Locking(RLL), especially evident with MAXROW(1) problem: Data page row IDs are not reused for insert due to the presumed presence of a scrollable cursor, even though there are no scrollable cursors present. This results in insert performance degradation and unnecessary table space growth. This situation can also occur when an object shares a buffer pool with another base table that contains LOB columns in the Universal Table space, LOB table space or aux index table space. . PI13695 condition: UTS with insert and delete activity symptom: excessive getpages problem: The deleted space under partition level, table space level or the table level exclusive lock, did not consider to reuse pseudo deleted space. The intention is to preserve the space for queries that request the current committed data feature. However, when there is an exclusive gross lock on the object, no other UR can access the data. Therefore, deleted space should be reused by insert operations within . PI18028 condition: TRUNCATE statement operates on segmented or UTS symptom: excessive getpages problem: During the exhaustive search for insert, if the page false lead count is exceeded after a few pages are searched within the space map page, the algorithm will search more pages before moving on to the next step. The segment chain of the last data page within the space map page is the anchor point of the next set of data pages to be searched. The high get page count occurs when the segment chain within the space map is not in physical acceding sequence. As a result, the same set of segments will be searched repeatedly when the process continues to exceed the data false lead count and further leads to a high get page count. The possible cause for the segment chain being out of physical sequence is Truncate or mass delete. . PI29726 PI31511 condition: segmented or UTS table space with page level locking symptom: hight CPU utilization in IRLM and XCFAS problem: When an insert transaction requests a page lock, and the transaction already holds the lock from a prior delete, it will trigger IRLM to alter the lock from a delete type to an insert type. If the global lock manager is on a different member, this alter request can cause the coupling facility to reject the alter, which in turn causes IRLM to reject the page lock request. The INSERT then continues to request page locks, and subsequent inserts do the same. When there is a high volume of inserts following a delete in the same commit scope, it can result in high CPU in the IRLM and XCFAS address spaces. This problem can also happen when there is a high volume of DELETEs following an INSERT in the same commit scope. ---------------------------------------------------------------- Recommended tuning: 1. Separate tablespaces without LOB or XML column from bufferpool of UTS tablespaces with LOB or XML column. condition: any tablespace with considerable insert/delete workload sharing a bufferpool in Data Sharing environment with UTS that has LOB/XML column if there are long running read transactions against any object in the bufferpool. symptom: excessive locks and getpages per insert if PLL locking, excessive getpages per insert if Row Level Locking(RLL), especially evident with MAXROW(1) Separate tablespaces without LOB/XML column from bufferpool where UTS tablespaces with LOB/XML column reside to avoid being impacted by read claim tracking for UTS tablespaces with LOB/XML column because V10 NFM added tracking by read Log Sequence Number(LSN) for UTS tablespaces that have LOB/XML column. Prior to V10 NFM, DB2 tracked oldest read LSN for LOB/XML tablespace for space reuse purposes. In datasharing the read claim tracking affects all objects within a bufferpool, causing DB2 to preserve deleted rows in case Sensitive Static Scrollable(SSS) cusrors are active. V10 NFM added tracking read LSN for UTS tablespace with LOB column for Sensitive Static Scrollable(SSS) cursors. If read LSN is pegged by a long running reader, data rows deleted since the reader started will not be reused, even if the delete is committed. A message DSNB260 reporting a long running read claim will show up (see Zparm LRDRTHLD). PI12709 will provide relief on the performance impact on the case that LOB/XML tables pace, LOB base UTS, XML base UTS are using the same buffer pool. It is a good practice to separate the LOB/XML from its base even with PI12709 being applied. Because there will be other turning that only apply to LOB at the buffer pool level. 2. Frequent commit for any long running uncommitted UR and clean up any indoubt threads condition: under long running uncommitted UR or indoubt thread existing symptom: excessive getpages In V10 UTS with PLL, a page lock proves committedness or ownership, but V10 has to determine the committedness of the page by checking a Commit Log Sequence Number (CLSN), because V10 insertions into UTS objects may additionally not reuse space freed by their own uncommitted work for rollbackability. If this checking fails due to any long running uncommitted UR in order to reuse any deleted space on the page, V10 treats the page it is holding as not committed, it will skip this page without reusing any of the deleted space on the page. PM99792 improved the reuse of space when V10 has a pseudo delete rows. A message DSNR035I reporting long running uncommitted UR will show up (see Zparm URCHKTH). In V10 UTS with RLL, the presence of long running URs hampers DB2's ability to detect if space is committed or not. This is true in all releases, but V10 UTS handles better than before V10 NFM. In before V10 UTS with PLL, possession of the page lock proved committedness or ownership and allowed all space to be used. So changing RLL to PLL is recommended, because RLL need CLSN for committedness. ---------------------------------------------------------------- Recommended documents for insert performance degradation to figure out the root cause: 1. DDL including database, table space, table and index definition. 2. Console dump with jobname=(ssnmDBM1,ssnmMSTR,ssnmIRLM,ssnmDIST) on the right member while an insert is going on. Sample dump operation: DUMP COMM=(any comment) reply to prompts; JOBNAME=(ssnmDBM1,ssnmMSTR,ssnmIRLM,ssnmDIST),CONT SDATA=(RGN,CSA,LSQA,SQA,PSA,SWA,LPA,TRT,GRSQ,SUM,ALLNUC),END *Where ssnm is the DB2 subsystem name *MAXSPACE 8000M minimum for DB2 V9, V10 and above would require 16000M+ MAXSPACE DISPLAY: D D,OPTIONS to verify the option 3. DB2 performance trace for 30 seconds to a few minutes by -STA TRACE(P) C(3,6,7) IFCID(198) TDATA(CPU,COR,TRA,DIST) DEST(GTF) [PLAN() AUTHID() PKGPROG()] * "PLAN() AUTHID() PKGPROG()" are optional to make the output smaller To avoid losing trace records, use multiple GTF datasets with BLOK=1M specified in GTF procedure. Here is a sample procedure on how to do that: //GTFDB2 PROC MEMBER=GTFDB2 //IEFPROC EXEC PGM=AHLGTF, // PARM='MODE=EXT,DEBUG=NO,TIME=YES,BLOK=1M,SIZE=8M', // REGION=0M //GTFOUTxx DD DSNAME=??????.TRACExx,UNIT=SYSDA,VOL=SER=xxxxxx, // DISP=(NEW,KEEP),SPACE=(CYL,(1000)),DCB=NCP=255 //* //* repeated 15 times for all the other trace data sets !! //* //SYSLIB DD DSNAME=SYS1.PARMLIB(&MEMBER),DISP=SHR If you use multiple datasets, you want to look for a AHL904I message when GTF is stopped identifying the trace data sets used: AHL006I GTF ACKNOWLEDGES STOP COMMAND AHL904I THE FOLLOWING TRACE DATASETS CONTAIN TRACE DATA : ????????.TRACE01 ????????.TRACE02 How to start and stop trace: 1) start GTF S GTF.G R xx,TRACE=USRP R xx,USR=(FB9),END R xx,U 2) start DB2 trace -STA TRACE(P) C(1,2,3,6,7,21) IFCID(198) TDATA(CPU,COR,TRA,DIST) DEST(GTF) [PLAN() AUTHID() PKGPROG()] 3) wait for 30 to 60 seconds 4) stop DB2 trace -DIS TRACE check TNO's you started -STO TRACE(P) TNO(pp) pp: TNO of Performance trace 5) stop GTF P G 4. DSN1PRNT with NODATPGS collected right after the performance trace above and before REORG to check exact space map bits. 5. DSN1COPY or IMAGE COPY of the partition after the performance trace above if you can determine "which partition" a row is inserted into by the degraded insert. 6. Insert key pattern (increasing or decreasing or random key for an insert) so that we can study how inserts are done by using cluster index. 7. Archive log data sets covering insert time from all of DB2 data sharing members.
Reported component name
PB LIB INFO ITE
Reported component ID
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following: