IBM Support

How reorg, statistics and row counts work in IBM DB2 Analytics Accelerator for z/OS

Product Documentation


Abstract

DB2 Analytics Acceleror for z/OS users complained that the row counts reported in the Accelerator view of IBM DB2 Analytics Accelerator Studio or by the ACCEL_GET_TABLES_INFO stored procedure were not accurate.
However, this is not true. The reported numbers are based on statistics that were collected at specific points in time, and are not just the result of a "count(*)" operation on the tables.

Since the concepts behind the row count are complex and difficult to explain (see below) the row count column was removed in current versions of IBM DB2 Analytics Accelerator Studio.

IDAA automates the reorganization and statistics collection of its tables. With the introduction of V2 PTF5 and V3 PTF2, the excecution time of these maintenance tasks changed. This change might have an impact on the table size and the performance of load and query tasks. This document explains when and how these automated tasks are run.

Content


Stable Transaction IDs

IBM DB2 Analytics Accelerator for z/OS implements multi-version concurrency control (MVCC) for its transactional processing. The set of rows that serves as the basis for a transaction does not change during the lifetime of the transaction. If two transactions, A and B, are running in parallel, transaction B might (logically) delete rows, but this deletion does not affect transaction A. Transaction A continues to work on the same set of rows, even if some of these were or will be deleted by transaction B. It does not matter whether transaction A has already accessed the table rows deleted by transaction B.

If a third transaction C is to delete these rows physically through the execution of a reorg statement, it must be ensured that the rows still needed by transaction A are not deleted.

To be able to determine the rows that can be physically deleted, each row is associated with a transaction ID indicating when the row was inserted and when it was logically deleted, that is, if it was deleted at all.

Additionally, IBM DB2 Analytics Accelerator for z/OS remembers the transaction ID of the last transaction that was committed before the start of the oldest transaction that is still running. This ID is called the "stable transaction ID". In the previous example, IBM DB2 Analytics Accelerator for z/OS would remember the ID of the last committed transaction before the start of transaction A.

Rows that were deleted logically after the start of transaction A will not be deleted physically by a reorg operation because transaction A might still have access to these rows.

The stable transaction ID is tracked through the entire system. Thus, it has an impact on all DB2 subsystems or data sharing groups that are connected to the same accelerator.

Reorg processing before V2 PTF5 and before V3 PTF2:

A reorg operation physically deletes table rows that were marked for deletion (logically deleted). If you reload a table or partition on an accelerator, the load task first deletes the existing rows from the table or partition, and then inserts then new rows that were obtained from DB2 for z/OS.

The reload triggers the execution of the reorg command on the affected tables right after finishing the SQL transaction for the load operation. The reorg process itself runs in a new transaction.

If another transaction, such as a query, is running at the same time, it might be that the reorg process starts, but cannot remove any rows although these rows were logically deleted by the load process. The reason for that is that the other transaction might still need access to these rows. Therefore, IBM DB2 Analytics Accelerator for z/OS must keep the stable transaction ID.

Before the fix pack releases V2 PTF5 and V3 PTF2, old and new table rows existed side by side after a reload. Although deleted rows did not influence the results of incoming queries, they still occupied disk space, and they still had to be scanned before they could be discarded as irrelevant for new queries. This resulted in a performance degradation.

Statistics processing before V2 PTF5 and before V3 PTF2:

The row count of the statistics command included rows of accelerator tables that could not be deleted physically for the reasons that were discussed before.

Reorg and statistics processing in V2 PTF5, V3 PTF2 and later:



To avoid incorrect statistics and actually reclaim disk space after a reload, IBM DB2 Analytics Accelerator for z/OS delays the reorganization until the stable transaction ID is higher then the transaction ID of the reload.
From a user's point of view, the reload is completed with the end of the load process itself, and DB2 for z/OS will return the correct number of rows (count(*)). Reorganization and statistics collection are not started before all other ongoing transactions, that is, those transactions that were started before or during the reload, have ended.

To this end, the reorganization task records the current "high watermark", the Netezza transaction ID, and stores it for the requested reorg operation. This is then called the "expected stable transaction ID". The reorg operation will be scheduled for execution only if the current stable transaction ID in Netezza is equal to or higher than the expected stable transaction ID. Thus, it is ensured that all transactions that might prevent the reclamation of disk space have been completed.

Hanging sessions

If an uncommitted transaction exists in the system, the stable transaction ID is held back.
Before the releases of V2 PTF5 and V3 PTF2, reorganization and statistics collection were run in such a case, but with no effect because logically deleted rows cannot be deleted physically.
This has changed with the introduction of V2 PTF5 and V3 PTF2. Reorganization and statistics collection processes do not start until the stable transaction ID is issued.
Irrespective of the release level, the table or the selected partitions grow with each reload.

A "hanging session" can be caused by an open DB2 query thread that has not yet fetched all result rows. In this case, the hanging session is automatically ended when the thread runs into a timeout, is aborted or canceled.

If the hanging session is instead caused by a software failure it might be necessary to restart the Netezza database. This is a very rare case that hardly ever occurs during regular operation. Do not restart the Netezza database manually unless you are advised so by IBM DB2 Analytics Accelerator for z/OS support personnel.

Starting with V3 PTF3, users are notified of hanging sessions with a DSNX881I message of subtype 21. This message is written to the SYSLOG.

[{"Product":{"code":"SS4LQ8","label":"Db2 Analytics Accelerator for z\/OS"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"2.1.0;3.1.0;4.1.0;5.1.0","Edition":"All Editions","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
08 August 2018

UID

swg27038232