Setting up peer restart and recovery

To allow the product to restart on an alternate system, the following prerequisites must be installed on every system (your original system as well as any systems intended for recovery) before reconfiguring the ARM policies to enable peer restart and recovery.

Before you begin

Deprecated feature: Peer Restart and Recovery (PRR) functionality is deprecated. You should use the integrated high availability support for the transaction service subcomponent, instead of Peer Restart and Recovery for transaction recovery. See the topic Transaction support in WebSphere Application Server for more information about the integrated high availability support for the transaction service subcomponent and how to configure it for peer recovery of transactions being processed on a application server that fails.

You must also make sure all of the systems, where you might need to perform restart, are part of the same RRS log group.

z/OS® Version 1.2 or higher
BCP APAR OA01584
RRS APARs OA02556 and OA2556
WebSphere® Application Server Version 5 or higher

Installing the prerequisite service updates on all of these systems will not hinder your current running environment if you want to continue to only restart in place. However, if this service is not installed, there is a possibility that the controller will not be able to move back. OTS will attempt to restart on the alternate system and fail. If there are any URs that are unresolved with RRS once this happens, the controller will not be allowed to restart on the home system until RRS is cancelled on the alternate system. For more information on OTS and RRS, see z/OS MVS™ Programming: Resource Recovery.

If you do not plan to use peer restart and recovery, you do not need to abide by these functional prerequisites. Your system will instead use the restart-in-place function.

The following products all support RRS. Individually, they also support peer restart and recovery, providing that the previously listed prerequisites are all properly installed:

DB2® Version 7 or higher
IMS Version 8 or higher
CICS® Version 1.3 or higher
MQSeries® Version 5.2 or higher

In addition to the preceding products, many JTA XAResource Managers can be used to assist in a the product peer restart and recovery. Consult your JTA XAResource Manager's documentation to determine if it supports restarting on an alternate system.

Avoid trouble: When setting up the ARM policy for a sysplex, make sure that both systems have the same level of the Application Server installed. For example, you cannot use an application server that is running WebSphere Application Server Version 5.1 to perform peer restart and recovery for an application server that is running WebSphere Application Server Version 6.0.1.

Prior to using peer restart and recovery:

You must ensure that the location service Daemon and node agent are already running on all systems that might be used for recovery. Otherwise, the recovering system might attempt to recover on a system that is not running the location service Daemon and node agent. If this happens, the server will fail to start, and recovery will fail.

Clients will see a performance impact if the systems are running at capacity. In an attempt to minimize the memory and CPU impact on the alternate system, the enterprise bean and web containers are not restarted for servers running in peer-restart mode. This means that application servers that are in the state of being recovered will not be able to accept any inbound work.

About this task

After the prerequisites are installed, starting a server on a system to which it was not configured implicitly places the server into peer restart and recovery mode. If you configured your XA Partner log to write to a non-shared HFS, or if you are using a JTA XA Resource Manager, you need to perform the following steps before starting a server:

Procedure

(Required only if you are using a non-shared HFS.) Enable non-shared HFS support.
When using a non-shared HFS, the configuration settings must be replicated across the different systems in the sysplex. This is done automatically by the deployment manager and node agent. To enable this support, each node agent in your configuration must be set as a recovery node. This change is made in the administrative console:
1. In the administrative console navigation, select System Administration > Node agents.
2. Select a node agent from the list.
3. In the Additional Properties section, select File Synchronization Service.
4. In the Additional Properties section, , select Custom properties.
5. Select New.
6. Enter recoveryNode for Name, and true for Value. The Description field can remain blank.
7. Repeat steps 3-7 for each node agent in your configuration.
8. Save your configuration.
(Required only if you are using JTA XAResource Managers.) Make appropriate logs and classes are available on the alternate system
If you plan to use peer restart and recovery, and your applications access JTA XAResource Managers, you must ensure that the appropriate logs and classes are available on the alternate system.
1. Point the product variable TRANLOG_ROOT to a shared HFS.
  The TRANLOG_ROOT variable must point to a shared HFS, to which all systems in the cell can write. The XA partner log is stored here, and the alternate system must be able to read and update this log.
  1. In the administrative console, click Servers > Server Types > WebSphere application servers > server_name.
  2. Under Container Services, click Transaction Service.
  3. Enter the directory of the shared HFS in the Transaction log directory field.
2. Store the driver (i.e., JDBC Driver, JMS Provider, or JCA Resource Adapter, etc.) for each JTA XAResource Manager in an HFS that is readable by all systems in the cell.
  For example, if your connector is a JDBC driver for a database, the driver would likely be stored in a read-only HFS that is accessible by all systems in the sysplex. This allows the alternate system to read the saved classpath for the resource, and reconstruct it during a restart.
  If the connector used to access a JTA XAResource Manager is not stored in an HFS that is readable by all systems that might be used for recovery, when an application server restarts on an alternate system, it will either appear that there is no XA recovery work to do, or it will be impossible to load the classes necessary to communicate with the JTA XAResource Manager
Resolve InDoubt units.
During a recovery, there will be instances when manual intervention is required to resolve InDoubt units. You will need to use RRS panels for this manual intervention.