Improving performance for applications that access distributed data

The key to improving the performance of applications that access remote data is to limit the number of network transmissions.

About this task

Introductory concepts

A query that is sent to a remote system can sometimes take longer to execute than the same query, accessing tables of the same size, on the local DB2® subsystem. The principal reasons for this potential increase in execution time are:

The time required to send messages across the network
Overhead processing, including startup and communication subsystem session management

Some aspects of overhead processing, for instance, network processing, are not under DB2 control.

Monitoring and tuning performance in a distributed environment is a complex task that requires knowledge of several products.

Procedure

To maximize the performance of an application that accesses distributed data, use the following approaches:

Write any queries that access distributed data according to the following recommendations to limit the number of messages that these queries send over the network:
- Reduce the number of columns and rows in the result table by keeping the select lists as short as possible, and use the WHERE, GROUP BY, and HAVING clauses to eliminate unwanted data at the remote server.
- Specify the FOR FETCH ONLY or FOR READ ONLY clause when possible. Retrieving thousands of rows as a continuous stream is reasonable. Sending a separate message for each one can be significantly slower.
However, be aware that a query that is sent to a remote subsystem almost always takes longer to execute than the same query that accesses tables of the same size on the local subsystem for the following reasons:
- Overhead processing, including startup and negotiating session limits (if SNA is used)
- The time required to send messages across the network
For any of the following situations, use the OPTIMIZE FOR n ROWS clause in your SELECT statements and query result sets from stored procedures:
- The application fetches only a small number of rows from the query result set.
- The application fetches a large number of rows from a read-only query.
- The application rarely closes the SQL cursor before fetching the entire query result set.
- The application does not issue statements other than the FETCH statement to the DB2 server while the SQL cursor is open.
- The application does not execute FETCH statements for multiple cursors that are open concurrently and defined with the OPTIMIZE FOR n ROWS clause.
- The application does not need to scroll randomly through the data.
The OPTIMIZE FOR n ROWS clause limits the number of data rows that the server returns on each DRDA network transmission.

Restriction: This clause has no effect on scrollable cursors.

If you specify 1, 2, or 3 for n , DB2 uses the value 16 (instead of n) for network blocking and prefetches 16 rows. As a result, network usage is more efficient even though DB2 uses the small value of n for query optimization.
For example, the following SQL statement causes DB2 to prefetch 16 rows of the result table even though n has a value of 1.
```
SELECT * FROM EMP  OPTIMIZE FOR 1 ROW ONLY;
```
For queries that have potentially large result tables, but need only a limited number of rows, specify the FETCH FIRST n ROWS ONLY clause. This clause limits the number of rows that are returned to a client program.
For example, suppose that you need only one row of the result table. You can add the FETCH FIRST 1 ROW ONLY clause, as shown in the following example:
```
SELECT * FROM EMP  OPTIMIZE FOR 1 ROW ONLY  FETCH FIRST 1 ROW ONLY;
```
In this case, the FETCH FIRST 1 ROW ONLY clause prevents 15 unnecessary prefetches.
If your program accesses LOB columns in a remote table, use the following techniques to minimize the number of bytes that are transferred between the client and the server:
- Use LOB locators instead of LOB host variables.
  If you need to store only a portion of a LOB value at the client, or if your client program manipulates the LOB data but does not need a copy of it, LOB locators are a good choice. When a client program retrieves a LOB column from a server into a locator, DB2 transfers only the 4-byte locator value to the client, not the entire LOB value.
- Use stored procedure result sets.
  When you return LOB data to a client program from a stored procedure, use result sets rather than passing the LOB data to the client in parameters. Using result sets to return data causes less LOB materialization and less movement of data among address spaces.
- Set the CURRENT RULES special register to DB2.
  When a DB2 server receives an OPEN request for a cursor, the server uses the value in the CURRENT RULES special register to determine the type of host variables that the associated statement uses to retrieve LOB values. If you specify a value of DB2 for the CURRENT RULES special register before you perform a CONNECT, and the first FETCH statement for the cursor uses a LOB locator to retrieve LOB column values, DB2 lets you use only LOB locators for all subsequent FETCH statements for that column until you close the cursor. If the first FETCH statement uses a host variable, DB2 lets you use only host variables for all subsequent FETCH statements for that column until you close the cursor. However, if you set the value of CURRENT RULES to STD, DB2 lets you use the same open cursor to fetch a LOB column into either a LOB locator or a host variable.
  
  Although a value of STD for the CURRENT RULES special register gives you more programming flexibility when you retrieve LOB data, you get better performance if you use a value of DB2. With the STD option, the server must send and receive network messages for each FETCH statement to indicate whether the data that is being transferred is a LOB locator or a LOB value. With the DB2 option, the server knows the size of the LOB data after the first FETCH, so an extra message about LOB data size is unnecessary. The server can send multiple blocks of data to the requester at one time, which reduces the total time for data transfer.
For example, suppose that a user wants to browse through a large set of employee records and look at pictures of only a few of those employees. At the server, you set the CURRENT RULES special register to DB2. In the application, you declare and open a cursor to select employee records. The application then fetches all picture data into 4-byte LOB locators. Because DB2 knows that 4 bytes of LOB data is returned for each FETCH statement, DB2 can fill the network buffers with locators for many pictures. When a user wants to see a picture for a particular person, the application can retrieve the picture from the server by assigning the value that is referenced by the LOB locator to a LOB host variable. This situation is implemented in the following code:
```
SQL TYPE IS BLOB my_blob[1M];
SQL TYPE IS BLOB AS LOCATOR my_loc;
⋮
FETCH C1 INTO :my_loc;    /* Fetch BLOB into LOB locator  */
⋮
SET :my_blob = :my_loc; /* Assign BLOB to host variable */
```
Ensure that each cursor meets one of the following conditions when possible, so that DB2 uses block fetch to minimize the number of messages that are sent across the network:
- The cursor is declared with either the FOR FETCH ONLY or FOR READ ONLY clause.
- The cursor is a non-scrollable cursor, and the result table of the cursor is read-only.
- The cursor is a scrollable cursor that is declared as INSENSITIVE, and the result table of the cursor is read-only.
- The cursor is a scrollable cursor that is declared as SENSITIVE, the result table of the cursor is read-only, and the value of the CURRENTDATA bind option is NO.
- The result table of the cursor is not read-only, but the cursor is ambiguous, and the value of the CURRENTDATA bind option is NO.
  A cursor is ambiguous when any of the following conditions are true:
  - It is not defined with the clauses FOR FETCH ONLY, FOR READ ONLY, or FOR UPDATE.
  - It is not defined on a read-only result table.
  - It is not the target of a WHERE CURRENT clause on an SQL UPDATE or DELETE statement.
  - It is in a plan or package that contains the SQL statements PREPARE or EXECUTE IMMEDIATE.
For ODBC and JDBC applications, use the rowset parameter to limit the number of rows that are returned from a fetch operation.
If a DRDA requester sends the rowset parameter to a DB2 server, the server performs the following actions:
- Returns no more than the number of rows in the rowset parameter
- Returns extra query blocks if the value of the EXTRA BLOCKS SRV field on the DISTRIBUTED DATA FACILITY PANEL 2 installation panel on the server allows extra query blocks to be returned
- Processes the FETCH FIRST n ROWS ONLY clause, if it is specified
- Does not process the OPTIMIZE FOR n ROWS clause
Use the recommended values for certain bind options. For more information about the recommended bind options, see BIND options for distributed applications.

Remote access and distributed data
DB2 supports remote access between requestor and server relational database management systems (DBMS)
Serving systems and distributed data
The serving system is the DBMS system that runs the remotely bound package.
BIND options for distributed applications
Certain bind options can improve the performance of SQL statements that run as part of distributed applications.
Improving performance for SQL statements in distributed applications
In many cases, you can use certain strategies to improve the performance of SQL statements that run on distributed systems.
Enabling block fetch for distributed applications
Block fetch can significantly decrease the number of messages sent across the network.
Optimizing for very large result sets for DRDA
Enabling a DB2 client to request multiple query blocks on each transmission can reduce network activity and improve performance significantly for applications that use DRDA access to download large amounts of data.
Optimizing for small results sets for DRDA
When a client does not need all the rows from a potentially large result set, preventing the DB2 server from returning all the rows for a query can reduce network activity and improve performance significantly for DRDA applications.
Data encryption security options
Data encryption security options provide added security for the security-sensitive data that an application requests from the system. However, the encryption options can also have a negative impact on performance.

Parent topic: Programming applications for performance

Related concepts:

How DB2 identifies packages at run time

Saving storage when manipulating LOBs by using LOB locators

Fetching a limited number of rows

Enabling block fetch for distributed applications

Tuning TCP/IP

Tuning the VTAM system

Related reference:

BIND and REBIND options for packages and plans

fetch-first-clause