IBM Support

An Overview of the Save-While-Active Process

Troubleshooting


Problem

This document provides an overview of the classic save-while-active function.

Resolving The Problem

The system performs the save-while-active function by maintaining an image of the object being saved as it exists at a single point (checkpoint).

When a save request is initiated, the system allocates (locks) objects and establishes checkpoint images for each memory-resident page of the object. At this point, the nonsave applications might not hold strong locks on any objects being saved. After the system obtains the checkpoint images, the locks acquired on behalf of the save request are downgraded, and the objects can be updated.

This picture describes the flow for SWA checkpoint processing.

Phase 1: Freeze the object and establish a checkpoint before starting the save operation. At the end of this phase, the object reaches a checkpoint, and the system sends the message CPI3712 (Save-while-active checkpoint processing complete).

Phase 2: The object changes activate a frozen image while the object is saved to media:
a. A request updates page U1.
b. A copy of the original page is sent to the image page I1.
c. The change is made to the object. The original page copied (page I1 now) is part of checkpoint image for the object.
Phase 3: More change requests are made to the object (U2 and U3). Matching clones of the frozen pages (I2, I3) are subfiled. The object is sent to media with each I1 - I3 replacing the pages U1- U3.

Phase 4: The shadow images of the changed pages of this object are no longer maintained because they are no longer required.

Phase 5: The object on the system has the U1, U2, and U3 changes. The copy or image of the object saved to the media does not contain those changes.

Notes:
1. The entire save fails if any concurrent application is using commitment control and there is an open commit cycle.
2. The save-while-active function uses more disk storage than normal save operations. As objects are changed, copies are made of the checkpoint data (Shadow). If the data on your system uses a high percentage of the disk capacity, and much of the data is changed during a save-while-active operation, it is possible that the system could run out of available storage. After the image of the object after checkpoint processing is saved to media, the pages used for the checkpoint image are no longer maintained and this storage can be recovered.
3. If an object is not available immediately during checkpoint processing, the save-while-active operation waits up to the specified number of seconds for the object to become available (SAVACTWAIT). While waiting for an object, the save operation does nothing else. The save operation might have to wait for several objects, so the total time that the save-while-active operation waits can be much longer than the value specified. If an object does not become available within the specified time, the object is not saved; however, the save operation continues.
4. Run save-while-active (SWA) operations during times of low system activity. A few interactive jobs or batch jobs that are primarily read-only, are examples of activities that allow better system performance during the save-while-active operation. Between the time an object is checkpointed and it is written to media, any page of its data that changed is copied to a "sidefile". For example, CLRPFM changes every page of the data, so if this operation is performed for an object that is being saved with a SWA operation, all of the data is copied to the sidefile, consuming system resources. Depending on the size and use of the file being cleared, this could highly impact the performance of the system. It is also worth noting that a CLRPFM operation cannot be ended.

Refer to the following links for additional information:

Performance considerations for save-while-active
 

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000001gnSAAQ","label":"Backup Recovery Install Migration-\u003ESave\/restore"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions"}]

Historical Number

367201106

Document Information

Modified date:
05 May 2022

UID

nas8N1015711