Making queries wait for incremental updates

To minimize the gap in data recency between the original Db2 for z/OS tables and incrementally updated accelerator-shadow tables (replication latency), you can delay the execution of queries until the latest incremental updates have been applied. This guarantees that an accelerated query against incrementally updated tables always "sees" the latest changes that have been committed in Db2 for z/OS.

This capability fulfills the criteria for Hybrid Transactional Analytical Processing (HTAP). It allows you to run transactional processes as well as analytical queries on a unified data store with a minimized time lag between transactional completion and analytical insight. Informed business decisions can be made much faster, on the basis of much more recent data.

To activate the delay, you must add a special register to your queries. You can also use a ZPARM or bind option to the same effect. The special register is:

SET CURRENT QUERY ACCELERATION WAITFORDATA = n.m.

where n.m. is a decimal number denoting seconds. The minimum is 0.0 (no delay). The maximum is 3600.0 (a delay of one hour).

Important: If you want to use a delay, specify at least 20 seconds. Lower values might work, but you must take the time into account that is needed to scan and read the SYSLOG and to report the findings. Experience has shown that realistic delays are around 20 seconds long.

Example

SET CURRENT QUERY ACCELERATION = ENABLE;
SET CURRENT QUERY ACCELERATION WAITFORDATA = 120.0;
SELECT * FROM XYZ WHERE [...]

The first special register in line 1 enables query acceleration. The second special register sets the delay to 120 seconds. What follows is the SQL code for the query. The execution of the query will be postponed for 120 seconds. During that time, the replication engine tries to apply all committed changes that have accumulated before the query started. If the replication engine cannot finish the job during that period, the query will not be accelerated. The further handling depends on the option set in the first special register and on the expiration behavior configured on the IBM Db2 Analytics Accelerator Console. The first special register is set to ENABLE. That means the query fails if one the conditions for query acceleration is not met. Let's assume that the conditions can be met; so normally, the query would run on the accelerator. However, the default expiration behavior is to abort a query if the incremental updates cannot be applied in time. Supposing the default behavior is set, the query would indeed fail if not all of the pending incremental updates can be applied in 120 seconds.

How to calculate the proper delay time

You certainly ask yourself how to determine the proper delay time (WAITFORDATA value) for your environment. There are two values by which an initial delay time or starting point can be calculated: the replication latency and the replication velocity. The replication latency is the time that the incremental update process is lagging behind. The replication velocity is a measure for the speed at which data changes could be applied in the past. You get these two values by running the following Db2 command:
-DIS ACCEL(<name>) DETAIL
where <name> is the name of the accelerator attached to the Db2 subsystem. If incremental updates have been configured for this pair of Db2 subsystem and accelerator, you find the two values in the command output. For example:
...
CURRENT REPLICATION LATENCY FOR THIS DB2 SYSTEM = 2000 MS
NUMBER OF SUCCESSFUL QUERY REQUESTS WITH DELAY PROTOCAL FOR ALL DB2 SYSTEMS = 0
NUMBER OF EXPIRED QUERY REQUESTS WITH DELAY PROTOCAL FOR ALL DB2 SYSTEMS = 1
REPLICATION VELOCITY = 0.04 DB2 LOG SECONDS APPLIED PER SECOND
...

In this example, The latency is 2000 milliseconds or 2 seconds, and the velocity is 0.04. You get the proper WAITFORDATA value by dividing latency by velocity. For the example, this gives you a WAITFORDATA value of 50 seconds to start with.

See Note on replication latency.

Conditions for the success or failure of a query

Whether a query fails depends on the setting of the SET CURRENT QUERY ACCELERATION special register and on the expiration behavior. It also depends on the type of the table. For accelerator-only tables, the WAITFORDATA option is always ignored because it can be assumed that the data in these tables is always the most recent data available (there is no other data in a Db2 for z/OS source table).

If the WAITFORDATA option has been set for queries that reference incrementally updated tables as well as tables not enabled for incremental updates (loaded tables), the query might fail if loaded tables are to be excluded and the expiration behavior is set to FAIL.

For information on how to configure the expiration behavior and the treatment of loaded tables, follow the links to Configuring the expiration behavior of WAITFORDATA queries and Including or excluding loaded tables from WAITFORDATA queries at the end of this topic.

Table 1 lists the possible combinations of settings and their effect with regard to query processing. It is assumed that an incoming query satisfies the conditions for query acceleration.
Table 1. Settings that influence WAITFORDATA query processing
CURRENT QUERY ACCELERATION WAITFORDATA Expiration behavior Loaded tables Query processing
NONE N/A N/A N/A Db2 for z/OS
ENABLE / ELIGIBLE / ALL 0.0 fail Include Accelerator
Exclude Query fails
continue Include Accelerator
Exclude Accelerator
> 0.0 fail Include Query fails if outstanding incremental updates cannot be a applied within WAITFORDATA period.
Exclude Query fails
continue Include Accelerator. The query might return results that are based on obsolete data
Exclude Accelerator. The query might return results that are based on obsolete data
ENABLE WITH FAILBACK 0.0 fail Include Accelerator
Exclude Db2 for z/OS
continue Include Accelerator
Exclude Accelerator. The query might return results that are based on obsolete data
> 0.0 fail Include Db2 for z/OS
Exclude Db2 for z/OS
continue Include Accelerator. The query might return results that are based on obsolete data
Exclude Db2 for z/OS
Note: A setting of the Loaded table parameter has an effect only if loaded tables are referenced by the query

Dependencies on other processing states, configurations, or product features

Other processing states or the use of certain configuration settings or other product features might affect query processing if WAITFORDATA is enabled (value > 0.0):

  • If continuous incremental updates are enabled, and one of the tables referenced in the query is currently reloaded, or in the spilling or draining state, the query will fail or be processed by Db2 for z/OS (ENABLE WITH FAILBACK). This is because the end of these processes or table states cannot be predicted, and in most cases, they take much longer than the WAITFORDATA period.
  • If continuous incremental updates are not enabled, the reload of a table stops the incremental update process. The end of the reload process cannot be predicted. It can easily take several hours. The execution of the query would have to wait until the process has finished and incremental updates are re-enabled. This is probably much more time than a user who is waiting for the results of a query is willing to accept. Therefore, WAITFORDATA queries will fail.
  • If partitions of a table have been archived with the High Performance Storage Saver, WAITFORDATA processing can still be used for the active partitions of the table. The archived partitions do not receive incremental updates, so a WAITFORDATA setting will not have an effect.
  • A setting of WAITFORDATA > 0.0 is ignored if an external stored procedure is run for in-database analytics or predictive modeling (IBM Netezza® Analytics). The external stored procedures are executed immediately, processing whatever data is currently available.
  • If workload balancing is used in a high-availability setup, a WAITFORDATA setting > 0.0 might lead to successful or failing queries. The decisive factor is the replication latency of the accelerator that a query is sent to. If the latency is higher than the WAITFORDATA period, the query fails. The problem is that one does not know beforehand to which accelerator a query is sent. The accelerator is picked at random.
  • If the product software on the accelerator is back-level, and does not support the WAITFORDATA protocol, a query does not necessarily fail. If CURRENT QUERY ACCELERATION is set to ENABLE, ELIGIBLE, or ENABLE WITH FAILBACK, the query might be processed by Db2 for z/OS. If, on the other hand, the special register is set to ALL, the query will fail. The same applies to the equivalent ZPARM or bind option settings.
  • WAITFORDATA processing is not possible if federated access has been granted to any of the tables referenced by a query. This is because the reference tables (referencing accelerator-only tables), which represent snapshots of the underlying accelerator-shadow tables, cannot be updated by the same incremental update process, and will inevitably be out-of-sync. A query against such tables will fail.
  • If a column has been added to a Db2 for z/OS table (add-column schema change), and the accelerator-shadow table of that table has been enabled for incremental updates, the success of a query with a setting of WAITFORDATA > 0.0 against this table depends on a successful schema synchronization within or before the start of the WAITFORDATA period. That is, the SYSPROC.ACCEL_SYNCHRONIZE_SCHEMA stored procedure or the equivalent function in your administration client must have been run successfully. If the stored procedure or client function has been started, but not yet completed, a successful run of an accelerated query against the table remains uncertain. It might or might not succeed, depending on the completion of the necessary changes to the catalog and the full reflection of the schema change in the accelerator-shadow table.
  • You cannot change the distribution key or the organizing keys of a table while incremental updates are enabled. To enable the Alter Keys function, you must first disable replication, and that means that a setting of WAITFORDATA > 0.0 will have no effect.
  • A failure of the active Netezza host has the effect that all queries with a WAITFORDATA setting will fail. However, these queries will be restarted and processed as usual after the passive host has taken over (failover).
  • Treatment of date, time, and timestamp values: An accelerated query that retrieves dates might give you the same values as Db2 for z/OS, but a query that selects time values or timestamps will probably produce different results. The reason is that the WAITFORDATA period and the difference between the z/OS system clock and the accelerator's system clock (=< 7 seconds) are added to these values while the incremental updates are being applied.