Configuring messaging engine and server behavior when a data store connection is lost

If the connection between a running messaging engine and its data store is lost, either due to a failure or because you stop the database for maintenance, you can ensure that the messaging engine functions correctly after the connection is restored, by configuring the server to restart automatically.

About this task

The behavior described in this topic occurs only if the messaging engine is running and has established exclusive locks on its data store.

By setting the sib.msgstore.jdbcFailoverOnDBConnectionLoss custom property on a messaging engine, you can determine the behavior of the messaging engine and its hosting server in the event that the connection to the data store is lost.
Table 1. The behavior that is determined by the sib.msgstore.jdbcFailoverOnDBConnectionLoss custom property. . The first column of the table lists the sib.msgstore.jdbcFailoverOnDBConnectionLoss custom property values. The second column explains the behavior of the messaging engine when the data store connection is lost.
Property value Behavior when the data store connection is lost
true (default)
The high availability manager stops the messaging engine and its hosting application server when the next core group service Is alive check takes place (the default value is 120 seconds). If a node agent is monitoring the server, and you have enabled automatic restart in the monitoring policy for the server, the server restarts. The messaging engine starts when an appropriate server is available.
Note: Messages with a reliability level that is lower than assured persistent might be accepted by the messaging engine during the interval between Is alive checks, and might be lost.
false

The messaging engine continues to run and accept work, and periodically attempts to regain the connection to the data store. If work continues to be submitted to the messaging engine while the data store is unavailable, the results can be unpredictable, and the messaging engine can be in an inconsistent state when the data store connection is restored.

Note: If work continues to be submitted to the messaging engine, even nonpersistent messaging can fail because the messaging engine might need to use the data store, for example to allocate a unique ID to a message, or to move nonpersistent messages out of memory.
[z/OS]false [z/OS]

The messaging engine continues to run and accept work, and periodically attempts to regain the connection to the data store.

Note: On z/OS where the high availability environment is in place (incorporating clustered WebSphere Application Servers, and DB2 data sharing groups), the setting of false is preferred and recommended. One scenario where the setting of false is not appropriate is a cluster with one member only and no server for the messaging engine to failover to.

Procedure

  1. Click Service integration -> Buses -> bus_name -> [Topology] Messaging engines -> engine_name -> [Additional Properties] Custom properties to navigate to the custom properties panel for the messaging engine.
  2. Click New.
  3. Type sib.msgstore.jdbcFailoverOnDBConnectionLoss in the Name field and true in the Value field.
  4. Click OK.
  5. Save your changes to the master configuration.
  6. Restart the application server.
  7. If you have a cluster, repeat the previous steps to add this property for every messaging engine in the cluster.

Results

If the connection between the messaging engine and its data store is lost, the application server that is hosting the messaging engine shuts down.

If you want the server to restart, ensure that Automatic restart is selected in the monitoring policy for the server.

What to do next

If a server restarts automatically in this situation, CWSID0039E messages appear in the JVM logs for the server.

After a server restart, click Service integration -> Buses -> bus_name -> [Topology] Messaging engines to view the status of the messaging engine. Check that the messaging engine has been restarted and is running.

If the server is a member of a cluster, check that the cluster members are still enabled for high availability.

You might want to tune your system so that the loss of the database connection is detected quickly, and the messaging engine waits for a reasonable amount of time for the data store to become available again before attempting to start on another server.