Maintaining operations in a production environment

You can do maintenance, repair, and overhaul (MRO) tasks in a production environment by removing each server from the multiple-server topology, and then add the servers back into the topology after the update is complete.

About this task

Do update and maintenance tasks on your Decision Server Insights production environment by removing individual servers from the topology. You do the update steps on the removed server, and then add the server back into the environment. You can also update multiple servers by removing them from the environment as a group, update, and then add them back to the topology as a group.

When servers are returned to the multiple server production topology, or new servers are added to the topology, the servers begin to store entity data and other data immediately. The servers might not pick up a portion of the event processing work load immediately, but they begin to do so over time. Since Insight Server stores data and processes events, a balanced processing load across all the servers in the topology is not required for good performance.

Update the server hosts, or groups of server hosts, in this order:

  1. Catalog hosts
  2. Runtime hosts
  3. Connectivity hosts

Procedure

  1. For each catalog host, do the following steps.
    1. Run the server stop command to shut down the catalog server. You can run this command from the <InstallDir>/runtime/wlp/bin directory, or make sure that this directory is in the system path. For example:
      server stop catalog_server1
    2. Update the catalog host.
      • Follow the procedures to update the operating system, or do other server maintenance tasks.
      • Determine which ports are used for catalog server communication and make sure that the port settings are correct in the server.xml file in the <InstallDir>/runtime/wlp/usr/servers/cisCatalog directory. The port assignments are included in the catalog cluster server endpoint entry.
        Important: If you change the name of the server during the update, you must update the server name in the server configuration settings. Edit the bootstrap.properties file and search for the server name.
    3. Restart the catalog server by running the server start command. If the update takes no time at all, wait for a minimum of 5 minutes before you restart the catalog server.
      Warning: This pause is imperative to allow the system to reset the state. Rule agent state and global event aggregate state can be affected if this delay is not respected. If the catalog servers are started too soon the system can be subjected to unrecoverable data loss.
      For example:
      server start catalog_server1
    4. Review the server status information in the log files, including messages.log. The log files are in the <InstallDir>/runtime/wlp/usr/servers/catalog_server1/logs directory.
  2. For each runtime host, do the following steps.
    Tip: If you do not have a recent backup, you might want to create a backup copy of the server configuration files, such as server.xml, bootstrap.properties, and objectGridDeployment.xml to preserve any recent configuration updates before you update your runtime servers.
    1. Shut down the runtime server by running the serverManager shutdown command. Before taking down an inbound connectivity server, do whatever is necessary to redirect inbound traffic especially in environments where events are delivered over HTTP. For example:
      serverManager shutdown --catalogServerHost=runtime_server1 --catalogServerPort=2809
    2. To update the runtime host, follow the procedures to update the operating system, or do other server maintenance tasks.
    3. Restart the runtime server. For example:
      server start runtime_server1
      Important: When the grid resumes after an interruption, rebalancing occurs over time; no additional intervention is required. When a server is added to the topology, the server picks up storage work immediately and it picks up event processing work after a while. The time that is taken to rebalance the grid depends on the event processing workload, the amount of data that is managed by the system, and the number of servers in the topology.
  3. For each connectivity host, do the following steps.

    The procedure is similar to updating the catalog hosts and the runtime hosts. However, the steps to back up and restore configuration files are not necessary for the connectivity servers.

    1. Stop the inbound connectivity servers by running the server stop command. For example, to stop an inbound connectivity server:
      server stop cisInbound
      For example, to stop an outbound connectivity server:
      server stop cisOutbound
    2. Update the hosts by following the procedures to update the operating system, or do other server maintenance tasks.
    3. Restart the connectivity server. For example, to start an inbound connectivity server::
      server start cisInbound
      For example, to start an outbound connectivity server:
      server start cisOutbound