The high availability deployment manager

The high availability (HA) deployment manager function is configured using a shared file system. When this configuration option is chosen, multiple deployment managers are configured. The benefit of the HA deployment manager function is that the deployment manager is no longer the single point of failure for cell administration. This is important in environments relying on automated operations, including application deployment and server monitoring.

Deployment manager overview

The deployment managers exist as peers. One is considered active, also known as primary, and hosts the administrative function of the cell, while the others are backups in standby mode. If the active manager fails, a standby takes over and is designated the new active deployment manager. A command line utility is provided to clone the original cell deployment manager into additional deployment managers. Each deployment manager is installed and configured to run on a different physical or logical computer. The deployment managers need not be hosted on homogenous operating platforms, although like platforms are recommended. Each deployment manager shares the same instance of the master configuration repository and workspace area. These must be located on a shared file system.

The file system must support fast lock recovery. The IBM® General Parallel File System™ (GPFS™) is recommended, and the Network File System Version 4 (NFS) is also an option. If you use the high availability deployment manager on AIX® Version 5.3 and are using NFS Version 4, you must have bos.net.nfs.client Version 5.3.0.60 or later.

Avoid trouble: You must stop all deployment managers that are running in your environment before you can perform maintenance on the NFS drive. Use the extended repository service in conjunction with the HA deployment manager feature. In the event of a NFS failure, you can recover the latest configuration changes by using the extended repository service.

Normal operation includes starting at least two deployment managers. A new highly available deployment manager component runs in each deployment manager to control which deployment manager is elected as the active one. Any other deployment manager in the configuration is in standby mode. The on demand router (ODR) is configured with the communication endpoints for the administrative console, the wsadmin tool, and scripting. The ODR recognizes which deployment manager instance is active and routes all administrative communication to that instance. The HA deployment manager function supports only use of the JMX SOAP connector. The JMX RMI connector is not supported in this configuration.

Note: After the HA deployment manager is set up, you do not access the administrative console on the deployment manager system but through the ODRs. As an example:

If computer C is odr1, then access the administrative console as follows:

http://odr1:port number/admin

where port number is the port number configured by the value, WC_adminhost_secure or WC_adminhost.

instead of http://dmgr1:port number/admin.

Configuration

The deployment managers are initially configured into the same core group. Configuring the deployment managers in the same core group is important so that the routing information that is exposed to the ODR is consistent across all the deployment managers. If the deployment managers are placed into separate core groups, the core groups must be connected with a core group bridge.

A typical HA deployment manager configuration consists of two deployment managers that are located on separate workstations. The deployment managers share a master repository that is located on a SAN FS. All administrative operations are performed through the elected active deployment manager. The standby deployment manager is fully initialized and ready to do work but cannot be used for administration. This is because the administrative function does not currently support multiple concurrent server processes writing to the same configuration. Therefore, the standby rejects any login and JMX requests.

All administrative operations are performed through the elected active deployment manager. The standby deployment manager is fully initialized and ready to do work but cannot be used for administration, because the administrative function does not currently support multiple concurrent server processes writing to the same configuration. Therefore, the standby deployment manager rejects any login and JMX requests. However, if the active deployment manager is stopped or fails, the HA deployment manager component recognizes the loss of the active deployment manager and dynamically switches the standby into active mode to take over for the lost deployment manager.

However, if the active deployment manager is stopped or fails, the highly available deployment manager component recognizes the loss of the active deployment manager and dynamically switches the standby into active mode so it can take over for the lost deployment manager. The active and standbys share work spaces. When a deployment manager takeover occurs, work is not lost, because the ODR automatically recognizes the election of the new active deployment manager and reroutes administrative requests to the new active deployment manager. Note that there is a sub 1 minute period of time where the deployment manager will not be available until failover to the secondary is complete.

Failover to the new active deployment manager is depicted in the following diagram:

ODR pair configured for communication availability

While the HA deployment manager component is able to detect deployment manager failure and initiate takeover, there are edge conditions where each deployment manager could temporarily believe it is the active deployment manager. To prevent this situation from occurring, the active deployment manager holds a file lock in the shared file system. Because of this, the takeover of the active deployment manager by the standby will take a brief period of time approximately equal to the time it takes for the shared file system to detect the loss of the active deployment manager and release the lock. SAN FS and NFS both use a lock lease model and have configurable times for lock release for failed lock holders. This can be configured as low as 10 seconds for SAN FS.

Note: It is not considered a best practice, but the high availability (HA) deployment manager can be used with a star topology in both the center cell and in the point cells. This configuration requires that an ODR be present in all cells, including the point cell, in order to use HA deployment manager. Requests cannot go through the point cell ODR(s), they must be used for deployment management purposes only. Also, center cell ODRs cannot route traffic to point cell deployment managers.