A common administrative operation in data warehouses is
to periodically roll in new data and roll out obsolete data. The following
scenarios illustrate these tasks.
Scenario 1: Rolling out obsolete data by detaching
a data partition
The following example shows how to detach
an unneeded data partition (DEC01) from a partitioned table named
STOCK. The detached data partition is used to create a table named
STOCK_DROP without any data movement.
ALTER TABLE stock DETACH PART dec01 INTO stock_drop;
COMMIT WORK;
To expedite the detach operation, index
cleanup on the source table is done automatically and in the background
through an asynchronous index cleanup process. If there are no detached
dependent tables defined on the source table, there is no need to
issue a SET INTEGRITY statement to complete the detach operation.
The
new table can be dropped or attached to another table, or it can be
truncated and loaded with new data before being reattached to the
source table. You can perform these operations immediately, even before
asynchronous index cleanup completes, unless the source table detached
dependent tables.
To determine whether a detached table is accessible,
query the SYSCAT.TABDETACHEDDEP catalog view. If a detached table
is found to be inaccessible, issue the SET INTEGRITY statement with
the IMMEDIATE CHECKED option against all of the detached dependent
tables. If you try to access a detached table before all of its detached
dependent tables are maintained, an error (SQL20285N) is returned.
Scenario 2: Creating a new, empty range
The
following example shows how to add an empty data partition (DEC02)
to a partitioned table named STOCK. The STARTING FROM and ENDING AT
clauses specify the range of values for the new data partition.
ALTER TABLE stock ADD PARTITION dec02
STARTING FROM '12/01/2002' ENDING AT '12/31/2002';
This ALTER TABLE…ADD PARTITION statement drains existing
static or repeatable-read queries that are running against the STOCK
table and invalidates packages on the table; that is, the statement
allows such queries to complete normally before it exclusively locks
the table (by using a Z lock) and performs the add operation. Existing
dynamic non-repeatable-read queries against the STOCK table continue,
and can run concurrently with the add operation. Any new queries attempting
to access the STOCK table after the add operation starts must wait
until the transaction in which the statement is issued commits. The
STOCK table is Z-locked (completely inaccessible) during this period.
Tip: Issue a COMMIT statement immediately after
the add operation to make the table available for use sooner.
Load
data into the table:
LOAD FROM data_file OF DEL
INSERT INTO stock
ALLOW READ ACCESS;
Issue a SET INTEGRITY statement
to validate constraints and refresh dependent materialized query tables
(MQTs). Any rows that violate defined constraints are moved to the
exception table STOCK_EX.
SET INTEGRITY FOR stock
ALLOW READ ACCESS
IMMEDIATE CHECKED
FOR EXCEPTION IN stock USE stock_ex;
COMMIT WORK;
Scenario 3: Rolling in new data by attaching a loaded
data partition
The following example shows how an attach
operation can be used to facilitate loading a new range of data into
an existing partitioned table (the target table named STOCK). Data
is loaded into a new, empty table (DEC03), where it can be checked
and cleansed, if necessary, without impacting the target table. Data
cleansing activities include:
- Filling in missing values
- Deleting inconsistent and incomplete data
- Removing redundant data that arrived from multiple sources
- Transforming the data through normalization or aggregation:
- Normalization. Data from different sources that represents
the same values in different ways must be reconciled as part of the
roll-in process.
- Aggregation. Raw data that is too detailed to store
in a warehouse must be aggregated during roll-in.
After the data is prepared in this way, the newly loaded data
partition can be attached to the target table.
CREATE TABLE dec03(…);
LOAD FROM data_file OF DEL REPLACE INTO dec03;
(data cleansing, if necessary)
ALTER TABLE stock ATTACH PARTITION dec03
STARTING FROM '12/01/2003' ENDING AT '12/31/2003'
FROM dec03;
During an attach operation, one or
both of the STARTING FROM and ENDING AT clauses must be specified,
and the lower bound (STARTING FROM clause) must be less than or equal
to the upper bound (ENDING AT clause). The newly attached data partition
must not overlap an existing data partition range in the target table.
If the high end of the highest existing range is defined as MAXVALUE,
any attempt to attach a new high range fails, because that new range
would overlap the existing high range. A similar restriction applies
to low ranges that end at MINVALUE. Moreover, you cannot add or attach
a new data partition in the middle, unless its new range falls within
a gap in the existing ranges. If boundaries are not specified by the
user, they are determined when the table is created.
This ALTER TABLE…ATTACH PARTITION statement
drains existing static or repeatable-read queries that are running
against the STOCK table and invalidates packages on the table; that
is, the statement allows such queries to complete normally before
it exclusively locks the table (by using a Z lock) and performs the
attach operation. Existing dynamic non-repeatable-read queries against
the STOCK table continue, and can run concurrently with the attach
operation. Any new queries attempting to access the STOCK table after
the attach operation starts must wait until the transaction in which
the statement is issued commits. The STOCK table is Z-locked (completely
inaccessible) during this period.
Tip: - Issue a COMMIT statement immediately after the attach operation
to make the table available for use.
- Issue a SET INTEGRITY statement immediately after the attach operation
commits to make the data from the new data partition available sooner.
The data in the attached data partition is not
yet visible because it is not yet validated by the SET INTEGRITY statement.
The SET INTEGRITY statement is necessary to verify that the newly
attached data is within the defined range. It also performs any necessary
maintenance activities on indexes and other dependent objects, such
as MQTs. New data is not visible until the SET INTEGRITY statement
commits; however, if the SET INTEGRITY statement is running online,
existing data in the STOCK table is fully accessible for both read
and write operations.
Tip: If data integrity checking, including
range validation and other constraints checking, can be done through
application logic that is independent of the data server before an
attach operation, newly attached data can be made available for use
much sooner. You can optimize the data roll-in process by using the
SET INTEGRITY…ALL IMMEDIATE UNCHECKED statement to skip range and
constraints violation checking. In this case, the table is brought
out of SET INTEGRITY pending state, and the new data is available
for applications to use immediately, as long as there are no nonpartitioned
user indexes on the target table.
Note: You
cannot execute data definition language (DDL) statements or utility
operations against the table while the SET INTEGRITY statement is
running. These operations include, but are not restricted to, the
following statements and commands:
- LOAD command
- REDISTRIBUTE DATABASE PARTITION GROUP command
- REORG INDEXES/TABLE command
- ALTER TABLE statement
- ADD COLUMN
- ADD PARTITION
- ATTACH PARTITION
- DETACH PARTITION
- CREATE INDEX statement
The SET INTEGRITY statement validates the data in the
newly attached data partition:
SET INTEGRITY FOR stock
ALLOW WRITE ACCESS
IMMEDIATE CHECKED
FOR EXCEPTION IN stock USE stock_ex;
Committing
the transaction makes the table available for use:
COMMIT WORK;
Any
rows that are out of range, or that violate other constraints, are
moved to the exception table STOCK_EX. You can query this table, fix
the rows, and insert them into the STOCK table.