IBM Support

Method of estimation log space needed for transaction/workload.

News


Abstract

This technote describes a method of estimating log space needed for transaction / workload, based on extrapolation analysis of a small sample of data from the database.

Content

The goal is to estimate log space needed for rollback of particular UPDATE transaction. We are going to use extrapolation approach which will let us estimate on the log space needed based on data of the TEST database.


1. First we create test environment with database TEST on following level:

DB21085I This instance or install (instance name, where applicable:
"db2inst1") uses "64" bits and DB2 code release "SQL10053" with level identifier "0604010E".
Informational tokens are "DB2 v10.5.0.3", "s140203", "IP23551", and Fix Pack "3".
Product is installed at "/LARGEHDD/V10.5".

db2start
db2 create db TEST
mkdir /LARGEHDD/db2_fs
mkdir /LARGEHDD/db2_logs
mkdir /LARGEHDD/db2_logs_active
db2 connect to TEST
db2 "create bufferpool b1 size automatic pagesize 4096"
db2 "create tablespace t1 pagesize 4096 managed by system using ('/LARGEHDD/db2_fs/') bufferpool b1"
db2 "update db cfg using logarchmeth1 disk:/LARGEHDD/db2_logs/"
db2 "update db cfg using NEWLOGPATH /LARGEHDD/db2_logs_active"
db2 "update db cfg using LOGFILSIZ 1280" -- one log file is 5MB
db2 "update db cfg using LOGPRIMARY 2"
db2 "update db cfg using LOGSECOND 200"
db2 terminate
db2 backup db TEST to /dev/null


2. Next, lets analyze log space for one of the tables - for this purpose, lets create big table (note, that in order to simplify case, table created does not contain any indexes, referential constraints, or triggers).

db2 connect to TEST
db2 create table tech_test like SYSIBM.SYSTABLES in t1
i=1
while [[ $i -le 1000 ]] ; do
db2 "insert into tech_test select * from SYSIBM.SYSTABLES"
echo $i
i=$(expr $i + 1)
done

db2 "SELECT tabname, SUM(DATA_OBJECT_P_SIZE)+ SUM(INDEX_OBJECT_P_SIZE)+ SUM(LONG_OBJECT_P_SIZE)+ SUM(LOB_OBJECT_P_SIZE)+ SUM(XML_OBJECT_P_SIZE) FROM SYSIBMADM.ADMINTABINFO where TABNAME = 'TECH_TEST' group by tabname"

TABNAME 2
---------------- --------------------
TECH_TEST 3448144

1 record(s) selected.

Output in KBs: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0052897.html?cp=SSEPGG_10.1.0%2F3-6-1-3-0-18




3. In order to use extrapolation method, at first place we need tables for testing. Lets create tables like the big table in subject. Please note that in real case, you might also need to create all related objects like indexes, referential constraints, or triggers as those affects amount of logging needed.

db2 create table tech_test_1 like tech_test in t1
db2 create table tech_test_2 like tech_test in t1


Per approach chosen we will need additional 2 extrapolation points. We see total amount of rows in the table:

db2 "select count(*) from tech_test"

1
-----------
428898

1 record(s) selected.


therefore lets set extrapolation points as following:

db2 "insert into tech_test_1 select * from tech_test fetch first 25000 rows only"
db2 "insert into tech_test_2 select * from tech_test fetch first 50000 rows only"


We also need to know size of new tables, therefore lets get that with queries:

db2 "SELECT tabname, SUM(DATA_OBJECT_P_SIZE)+ SUM(INDEX_OBJECT_P_SIZE)+ SUM(LONG_OBJECT_P_SIZE)+ SUM(LOB_OBJECT_P_SIZE)+ SUM(XML_OBJECT_P_SIZE) FROM SYSIBMADM.ADMINTABINFO where TABNAME = 'TECH_TEST_1' group by tabname"

TABNAME 2
---------------- --------------------
TECH_TEST_1 201404


db2 "SELECT tabname, SUM(DATA_OBJECT_P_SIZE)+ SUM(INDEX_OBJECT_P_SIZE)+ SUM(LONG_OBJECT_P_SIZE)+ SUM(LOB_OBJECT_P_SIZE)+ SUM(XML_OBJECT_P_SIZE) FROM SYSIBMADM.ADMINTABINFO where TABNAME = 'TECH_TEST_2' group by tabname"

TABNAME 2
---------------- --------------------
TECH_TEST_2 402420



4. Lets check log space needed for update of 1 column (in real case, you might check real updates/inserts or even workloads on partial data):

First of all, lets check first extrapolation point (i.e. 25000 rows). Initial state:

db2 deactivate db TEST
db2 connect to TEST

db2inst1@shared:~> ls -la /LARGEHDD/db2_logs_active/NODE0000/LOGSTREAM0000/
total 10292
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:06 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:06 S0000645.LOG
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:06 S0000646.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


Do the update:

db2 "describe select * from tech_test_1" > ./describe_info
UPDCOL=$(cat ./describe_info |grep -e "CHAR" -e "INT" |head -n 1 |awk {'print $4'})
db2 +c "update tech_test_1 set "$UPDCOL"='0'"


State after update:

db2inst1@shared:~> ls -la /LARGEHDD/db2_logs_active/NODE0000/LOGSTREAM0000/
total 46272
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:10 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:10 S0000645.LOG
...
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:10 S0000653.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


Do rollback on the transaction:

db2 +c "rollback"


State after rollback:

db2inst1@shared:~> ls -la /LARGEHDD/db2_logs_active/NODE0000/LOGSTREAM0000/
total 46272
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:10 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:10 S0000645.LOG
...
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:11 S0000653.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


NOTE: Although amount of logs have not increased, rollback produced some log records - we can see this in update timestamp of log file S0000653.LOG.


As we can see, update + rollback of 25000 rows consumed 8 log files (S0000645.LOG - S0000653.LOG) i.e. around 40MB. Taking into account physical size of the data, for update and rollback of 196MB table we used 40MB of log space.


Now lets do the same for 50000 rows. First of all lets recycle instance to have the state similar for all tests (i.e. initial logs used)):

db2 terminate
db2 deactivate db TEST
db2 connect to TEST

ls -la /LARGEHDD/db2_logs_active/NODE0000/LOGSTREAM0000/
total 10292
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:14 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:14 S0000654.LOG
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:14 S0000655.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


Do update:

db2 +c "update tech_test_2 set "$UPDCOL"='0'"


State after update:

ls -la /LARGEHDD/db2_logs_active/NODE0000/LOGSTREAM0000/
total 92532
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:15 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:15 S0000654.LOG
...
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:15 S0000671.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


Do rollback on the transaction:

db2 +c "rollback"


State after rollback:

total 92532
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:15 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:15 S0000654.LOG
...
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:15 S0000671.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


As we can see, update + rollback of 50000 rows consumed 17 log files (S0000654.LOG - S0000671.LOG) i.e. around 85MB. Taking into account physical size of the data, for update and rollback of 393MB table we used 85MB of log space.


5. Now lets extrapolate:

Y1=40 - size of log space
X1=196 - size of data

Y2=85 - size of log space
X2=393 - size of data

We have 2 equations:

40 = 196m + b
85 = 393m + b

therefore:

40 = 196m + b -> b = 40 - 196m
85 = 393m + 40 - 196m -> 45 = 197m

m = 45/197 = 0.23
b = 40 - 196*0.23 = -5

Provided that log space needed is linear function, we can estimate with quotation: Y = 0.23*X - 5

Lets verify our estimate by checking on entire table (i.e. 428898 rows).

X=3367MB - size of the table
Y= ? - size of the log space

Y = 0.23*X - 5 = 0.23*3367 - 5 = 769.41

NOTE: Our log space configured is 202*5MB so we should fit.

db2 terminate
db2 deactivate db TEST
db2 connect to TEST

Initial state:

total 10292
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:27 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:15 S0000671.LOG
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:27 S0000672.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


Do update:

db2 +c "update tech_test set "$UPDCOL"='0'"


State after update:

total 740172
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:33 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:29 S0000671.LOG
...
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:33 S0000814.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG


Do rollback on the transaction:

db2 +c "rollback"


State after rollback:

db2inst1@shared:~> ls -la /LARGEHDD/db2_logs_active/NODE0000/LOGSTREAM0000/
total 740172
drwxr-x--- 2 db2inst1 db2iadm1 4096 2014-12-31 06:42 .
drwxr-x--- 3 db2inst1 db2iadm1 4096 2014-12-31 04:32 ..
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:29 S0000673.LOG
...
-rw------- 1 db2inst1 db2iadm1 5251072 2014-12-31 06:42 S0000816.LOG
-rw------- 1 db2inst1 db2iadm1 512 2014-12-31 04:32 SQLLPATH.TAG



As we can see, update + rollback of entire table consumed 143 log files (S0000145.LOG - S0000003.LOG) i.e. around 715MB. Out estimation was following:

Y = 0.23*X - 5 = 0.23*3367 - 5 = 769.41

As we can see, the estimation is close to real value (difference might depend on existence of log records not directly related to the update, like for ex. tablespace increases)

To wrap up, in order to make general answer, we delivered estimation method that you can use in your database to determine expected log space needed.
You might want to extend the method above, and run your workload on test database containing sample of data to plan space needed in the production system.

Document information

More support for: DB2 for Linux, UNIX and Windows
Install/Migrate/Upgrade - Install/Setup - Command Line

Software version: 10.5

Operating system(s): AIX, HP-UX, Linux, Solaris, Windows

Reference #: 1694302

Modified date: 19 January 2015


Translate this page: