You can use database configuration parameters to achieve optimal performance with DB2® high availability disaster recovery (HADR).
To achieve optimal performance with DB2 high availability disaster recovery (HADR), ensure that your database configuration meets the following requirements.
One exception to the configuration parameter behavior described in the previous paragraph is the logfilsiz database configuration parameter. Although this parameter is not replicated to the standby database, to guarantee identical log files on both databases, the standby database ignores the local logfilsizconfiguration and creates local log files that match the size of the log files on the primary database.
After a takeover, the original standby (new primary) takes the logfilsiz value that was set on the original primary until the database is restarted. At that point, the new primary reverts to the value configured locally. In addition, the new primary also truncates the current log file and resizes any pre-created log files.
If the databases keep switching roles as a result of a non-forced takeover and neither database is deactivated, then the log file size used is always the one established by the very first primary. However, if there is a deactivate and then a restart on the original standby (new primary) then it would use the log file size configured locally. This log file size would continue to be used if the original primary takes over again. Only after a deactivation and restart on the original primary would the log file size revert to the settings on the original primary.
By default, the log receive buffer size on the standby database is two times the value specified for the logbufszconfiguration parameter on the primary database. There might be times when this size is not sufficient. For example, when the HADR synchronization mode is asynchronous and the primary and standby databases are in peer state, if the primary database is experiencing a high transaction load, the log receive buffer on the standby database might fill to capacity and the log shipping operation from the primary database might stall. To manage these temporary peaks, you can increase the size of the log receive buffer on the standby database by modifying the DB2_HADR_BUF_SIZE registry variable.
If a load operation is executed on the primary database with the COPY YES option, the command executes on the primary database, and the data is replicated to the standby database as long as the copy can be accessed through the path or device specified by the LOAD command. If the standby database cannot access the data, the table space in which the table is stored is marked invalid on the standby database. The standby database will skip future log records that pertain to this table space. To ensure that the load operation can access the copy on the standby database, it is recommended that you use a shared location for the output file from the COPY YES option. Alternatively, you can deactivate the standby database while the load operation is performed, perform the load on the primary, place a copy of the output file in the standby path, and then activate the standby database.
If a load operation is executed on the primary database with the NONRECOVERABLE option, the command executes on the primary database and the table on the standby database is marked invalid. The standby database will skip future log records that pertain to this table. You can choose to issue the LOAD command with the COPY YES and REPLACE options specified to bring the table back, or you can drop the table to recover the space.
Because executing a load operation with the COPY NO option is not supported with HADR, the command is automatically converted to a load operation with the NONRECOVERABLE option. To enable a load operation with the COPY NO option to be converted to a load operation with the COPY YES option, set the DB2_LOAD_COPY_NO_OVERRIDE registry variable on the primary database. This registry variable is ignored by the standby database. Ensure that the device or directory specified on the primary database can be accessed by the standby database using the same path, device, or load library.
If you are using Tivoli® Storage Manager (TSM) to perform a load operation with the COPY YES option, you might need to set the vendoropt configuration parameter on the primary and standby databases. Depending on how TSM is configured, the values on the primary and standby databases might not be the same. Also, when using TSM to perform a load operation with the COPY YES option, you must issue the db2adutl command with the GRANT option to give the standby database read access for the files that are loaded.
Honoring peer window transition when breaking out of peer state ensures peer window semantics for safe takeover in all cases. If the primary fails during the transition, normal peer window protection still applies (safe takeover from standby as long as it is still in disconnected peer state).
On the standby side, after disconnection, the database continues replaying already received logs. After the received logs have been replayed, the standby reconnects to the primary. Upon re-connection, normal state transition follows (first remote catchup state, then peer state)
Relationship to HADR_TIMEOUT:
The hadr_timeout database configuration parameter does not break the primary out of peer state if the primary keeps receiving heartbeat messages from the standby while blocked. hadr_timeout is a timeout for the HADR network layer. An HADR database breaks the connection to its partner database if it has not received any message from its partner for the hadr_timeout period. It does not control timeout for higher layer operations such as log shipping and ack. If log replay on the standby database is stuck on a large operation such as load or reorganization, the HADR component still sends heartbeat messages to the primary database on normal schedule. In such a scenario, the primary is blocked as long as the standby replay is blocked, unless DB2_HADR_PEER_WAIT_LIMIT is set.
DB2_HADR_PEER_WAIT_LIMIT unblocks primary logging regardless of connection status. Note that even if DB2_HADR_PEER_WAIT_LIMIT is not set, the primary always breaks out of peer state when a network error is detected or the connection is closed (possibly as result of HADR_TIMEOUT).
Several new database configuration parameters are available to support HADR. Setting these parameters does not change the role of a database. You must issue the START HADR or STOP HADR commands to change the role of a database.
HADR configuration parameters are not dynamic. Any changes made to an HADR configuration parameter are not effective until the database has been shut down and restarted. In a partitioned database environment, the HADR configuration parameters are visible and can be changed, but they are ignored.
The local host name of the primary database must be the same as the remote host name of the standby database, and the local host name of the standby database must be the same as the remote host name of the primary database. Use the hadr_local_host and hadr_remote_host configuration parameters to set the local and remote hosts for each database. Configuration consistency for the local and remote host names is checked when a connection is established to ensure that the remote host specified is the expected database.
IP mode used for hadr_local_host | IP mode used for hadr_remote_host | IP mode used for HADR communications |
---|---|---|
IPv4 address | IPv4 address | IPv4 |
IPv4 address | IPv6 address | Error |
IPv4 address | hostname, maps to v4 only | IPv4 |
IPv4 address | hostname, maps to v6 only | Error |
IPv4 address | hostname, maps to v4 and v6 | IPv4 |
IPv6 address | IPv4 address | Error |
IPv6 address | IPv6 address | IPv6 |
IPv6 address | hostname, maps to v4 only | Error |
IPv6 address | hostname, maps to v6 only | IPv6 |
IPv6 address | hostname, maps to v4 and v6 | IPv6 |
hostname, maps to v4 only | IPv4 address | IPv4 |
hostname, maps to v4 only | IPv6 address | Error |
hostname, maps to v4 only | hostname, maps to v4 only | IPv4 |
hostname, maps to v4 only | hostname, maps to v6 only | Error |
hostname, maps to v4 only | hostname, maps to v4 and v6 | IPv4 |
hostname, maps to v6 only | IPv4 address | Error |
hostname, maps to v6 only | IPv6 address | IPv6 |
hostname, maps to v6 only | hostname, maps to v4 only | Error |
hostname, maps to v6 only | hostname, maps to v6 only | IPv6 |
hostname, maps to v6 only | hostname, maps to v4 and v6 | IPv6 |
hostname, maps to v4 and v6 | IPv4 address | IPv4 |
hostname, maps to v4 and v6 | IPv6 address | IPv6 |
hostname, maps to v4 and v6 | hostname, maps to v4 only | IPv4 |
hostname, maps to v4 and v6 | hostname, maps to v6 only | IPv6 |
hostname, maps to v4 and v6 | hostname, maps to v4 and v6 | IPv6 |
The primary and standby databases can make HADR connections only if they use the same format. If one server is IPv6 enabled (but also supports IPv4) and the other server supports IPv4 only, at least one of the hadr_local_host and hadr_remote_host parameters must specify an IPv4 address. This tells the database to use IPv4 even if the server supports IPv6.
When you specify values for the high availability disaster recovery (HADR) local service and remote service parameters (hadr_local_svc and hadr_remote_svc) while preparing an update database configuration command, the values you specify must be ports that are not in use for any other service, including other DB2 components or other HADR databases. In particular, you cannot set either parameter value to the TCP/IP port used by the server to await communications from remote clients (the SVCENAME database manager configuration parameter) or the next port (SVCENAME + 1).
If the primary and standby databases are on different machines, they can use the same port number or service name; otherwise, different values should be used. The hadr_local_svc and hadr_remote_svc parameters can be set to either a port number or a service name.
The synchronization mode (specified by the hadr_syncmode configuration parameter) and timeout period (specified by the hadr_timeout configuration parameter) must be identical on both the primary and standby databases. The consistency of these configuration parameters is checked when an HADR pair establishes a connection.
TCP connections are used for communication between the primary and standby databases. A primary database that is not connected to a standby database, either because it is starting up or because the connection is lost, listens on its local port for new connections. A standby database that is not connected to a primary database continues issuing connection requests to its remote host.
Although the local host and local service parameters (hadr_local_host, hadr_local_svc) are only used on the primary database, you should still set them on the standby database to ensure that they are ready if the standby database has to take over as the primary database.
When the primary database starts, it waits for a standby to connect for a minimum of 30 seconds or for the number of seconds specified by the value of the hadr_timeout database configuration parameter, whichever is longer. If the standby does not connect in the specified time, the startup fails. (The one exception to this is when the START HADR command is issued with the BY FORCE option.)
After an HADR pair establishes a connection, they will exchange heart beat messages. The heartbeat interval is one-quarter of the value of the hadr_timeout database configuration parameter, or 30 seconds, whichever is shorter. The hadr_heartbeat monitor element shows the current number of heartbeats that have been consecutively missed on the HADR connection. If one database does not receive any message from the other database within the number of seconds specified by hadr_timeout, it initiates a disconnect. This means that at most it takes the number of seconds specified by hadr_timeout for a primary to detect the failure of either the standby or the intervening network. If you set the hadr_timeout configuration parameter too low, you will receive false alarms and frequent disconnections.
If the hadr_peer_window database configuration parameter is set to zero, then when the primary and standby databases are in peer state, problems with the standby or network block primary transaction processing only for the number of seconds specified by the hadr_timeout configuration parameter, at most. If you set the hadr_peer_window to a non-zero value, then the primary database does not commit transactions until connection with the standby database is restored, or the hadr_peer_window time value elapses, whichever happens first.
The following sample configuration is for the primary and standby databases.
HADR_LOCAL_HOST host1.ibm.com
HADR_LOCAL_SVC hadr_service
HADR_REMOTE_HOST host2.ibm.com
HADR_REMOTE_SVC hadr_service
HADR_REMOTE_INST dbinst2
HADR_TIMEOUT 120
HADR_SYNCMODE NEARSYNC
HADR_PEER_WINDOW 120
HADR_LOCAL_HOST host2.ibm.com
HADR_LOCAL_SVC hadr_service
HADR_REMOTE_HOST host1.ibm.com
HADR_REMOTE_SVC hadr_service
HADR_REMOTE_INST dbinst1
HADR_TIMEOUT 120
HADR_SYNCMODE NEARSYNC
HADR_PEER_WINDOW 120