IBM Support

Webserver Plugin configuration

Question & Answer


Question

How to configure WebSphere® web server plug-in configuration for the best results.

Cause

Some users are confused about how the ServerIOTimeout setting affects how the WebSphere® web server plug-in handles failed requests. This document is intended to clarify this confusion.

There are competing factors that must be considered to determine the appropriate WebSphere web server plug-in settings for your environment. There is no single configuration that is correct for all environments. Consider the following facts to determine the best values for your environment.

  • Connections consume resources on both the machine where the plug-in is installed and the machine where the application server is installed.
  • Connections that do not terminate do not free these resources and can exhaust the resource pool.
  • Timeout values set too low cause the client to experience unnecessary failures and increase the traffic load from the retries.
  • Heavy traffic and intermediate issues can cause unexpected delays in responses.
  • When an application response is not received within the ServerIOTimeout specification, you can decide whether to continue to send requests to that server. If you continue to send requests to that server, you risk timeouts and failures if there is a problem on the machine. If you decide to halt requests to that server, you decrease your site capacity.
  • Requests that contain a message body and typically can change the application state, such as a POST request, must not be retried unless the application is designed to accept multiple instances of the same request.
  • Requests that do not contain a message body, such as GET and HEAD requests, are automatically retried by the plug-in when failures occur and this functionality cannot be disabled. If you configure the plug-in mark an unresponsive server offline, you must adjust other settings to ensure that a single request never disables your entire site.

Answer

The WebSphere web server plug-in has several settings that can be tuned to achieve different outcomes relative to timeouts and retries.
ServerIOTimeout setting

The ServerIOTimeout property is designed to allow requests to expire instead of waiting indefinitely for a response from the server.

If the ServerIOTimeout property is set to 0, do not expect requests to be retried. A request is not retried until the previous connection is broken. If you need to retry failed requests, set ServerIOTimeout to a nonzero value.

In current service releases, the default value of the ServerIOTimeout property is 900 seconds.  The default value is a generously long value chosen to avoid unexpected timeouts in even the 99th percentile of slow applications. Choose a value slightly larger than the expected worst-case response time.

The ServerIOTimeout setting is based on how long the server takes to handle a request; not on a particular URI or application. When a value is specified for the ServerIOTimeout property, you must allow for the slowest, longest request time, and then add a little more time to handle peak operation situations.

A server is NOT marked as down if the ServerIOTimeout property is a positive value, zero or higher, and a response is not received within the ServerIOTimeout time period. Other requests continue to be sent to this server. When affinity is defined and the request contains session data, the same server is selected during a retry because the server is still available. The maximum number of times the same server is selected to process the request is the number of servers defined in the cluster. If the server is not healthy, sending requests back to the same server is not likely to result in a good response and exasperates a potential performance problem.

You can specify a negative value for the ServerIOTimeout property. When the ServerIOTimeout value is negative, the plug-in marks the server offline when a response is not received within the ServerIOTimeout time period. When the plug-in marks down the server, requests are not be sent to that server until the interval specified for the plug-in RetryInterval property expires. If there is only one server in the cluster, it is never marked down regardless of the plug-in properties.

If you set the ServerIOTimeout timeout value too low and the property is a negative value, there are bad consequences. If you have a long running request that exceeds the ServerIOTimeout value and the request is retried to other servers, the plug-in marks down each server as the request fails. Therefore, if you specify a negative value for the ServerIOTimeout property, you must ensure that the value specified for the RetryInterval property is within the range such that:

  • The lowest value for the range is 1. The server is only guaranteed a second to try to recover. The minimum value must be a value that is reasonable for the server to recover.
  • The highest value for the range is 1 less than the result of multiplying the absolute value of the setting for the ServerIOTimeout property by one less than the number of servers in the cluster. ((absolute value of the ServerIOTimeout * (number of servers in cluster -1)) - 1

For example, if you set the ServerIOTimeout property to -5, and you have 3 servers in the cluster, the value specified for the RetryInterval property must be in the range 1 - 9. Specifying such a value guarantees that all of the servers are never marked down because of an unexpected intensive request.
ServerIOTimeoutRetry setting
The ServerIOTimeoutRetry property controls the number of times a request is retried when a timeout occurs.

If your application is designed such that a single request can have significant impact, such as a query that involves data locking, the ServerIOTimeoutRetry property can be used to a prevent retries.

By default, this value is set to -1 meaning the plug-in module retries up to the number of members in the cluster.

If this value is set to 0, or the ServerIOTimeoutRetry attribute is absent, no retries occur after data is sent to the server.

Requests with bodies

Request types PUT and POST typically contain data in the body of the request. If a request using a different type has content in the body, the AcceptAllContent property must be set to true for the Plug-in to process the data in the body. The PostBufferSize property controls whether the data in a request body is temporarily stored or not. If the data is not stored, the request can not be retried to another server when a failure or timeout occurs.  In 8.5.0.0 and later, this value is set to 0 by default, disabling retries of requests with bodies. In prior release this property was set to 64K by default.

Interaction with affinity

If a nonaffinity request is retried, the plug-in attempts to handle the request on each available server until a good response is received or all servers are attempted.

If an affinity request is retried and there is a positive ServerIOTimeout value, affinity requests are retried to the affinity server onlyIf there is a negative ServerIOTimeout value, the affinity server is marked down, and different servers are chosen for retries.

It is recommended to set the ServerIOTimeoutRetry value such that when combined with the ServerIOTimeout value, the product is the maximum time a client is expected to wait for a response. For example, assume that users accept a five-minute response time and the ServerIOTimeout value is set to 60 seconds, set the ServerIOTimeoutRetry value to five or less (assuming there are at least five members in the cluster).

The ServerIOTimeoutRetry property was introduced with apar PM70559


Example:

Environment:

  • There are three servers in the cluster.
  • The RetryInterval property is set to the default value of 60 seconds.
  • The ServerIOTimeout property is set to -5 seconds.
    NOTE: The value used is an example only. It is not meant to imply that you should use this value in your environment.
  • The ServerIOTimeoutRetry property is set to the default value of -1.
  • The response is not received within five seconds.
  • Assume t is the time the original request is received.

Problem:

  1. The plug-in sends the request to server 1 and waits five seconds for a response.
    No response is received.
    Server 1 is marked down at the time the request was received plus five seconds (t + 5).
  2. The request is now sent to server 2.
    No response is received within five seconds.
    The plug-in marks server 2 down at the time the original request was received plus ten seconds (t + 10).
  3. The request is now sent to server 3.
    A response is not received.
    The server is marked down at the time the original request was received plus fifteen seconds (t + 15).
    An error is sent to the client.

Because all of the servers are now marked down, all requests fail until the RetryInterval expires on a server. When the RetryInterval expires, a single request is allowed to be forwarded to the server. If a good response is received, the server is marked available to receive any request, otherwise, the server is marked offline for another RetryInterval time period.

  • Server 1 can receive a request 60 seconds after it was marked down (t + 5 + 60).
    This is the time the original request was received plus the five second (ServerIOTimeout) period plus the 60 seconds for the RetryInterval setting.
  • Server 2 was marked down at time t + 10 and can receive a request sixty seconds later (t + 10 + 60).
  • Server 3 can receive a request after an extra 5 seconds (t + 15 + 60).

There are 50 seconds where all the servers are marked as down and ALL requests fail during this time ((t + 5+ 60) - (t + 15)). The fifty seconds represent the time the RetryInterval expires on server 1 minus the time server 3 was marked down.

Resolution:

Prevent all servers from being marked down simultaneously by adjusting the RetryInterval value or the ServerIOTimeoutRetry value.

Using RetryInterval:

  • Modify the RetryInterval value from 60 to 9.

    ((number of servers-1) x (absolute value of ServerIOTimeout)) - 1 = (( 3 - 1) * 5) - 1 = 10 - 1 = 9

The servers will be marked down in the same manner as shown in the previous example.
An error is sent to the client after Server 2 is marked down (t + 10).
Server 1 can receive a request 9 seconds after it is marked down (t + 5 + 9) = t + 14.
Server 3 is not marked down until t + 15 so there is no time when all servers are marked down.

Using ServerIOTimeoutRetry:

  • The RetryInterval remains at the default value of 60.
  • Modify the ServerIOTimeoutRetry value to 2.

With these values, the request is sent to Server 1. Server 1 is marked down at time t + 5.
The request is retried to Server 2. Server 2 is marked down at time t + 10.
The request is not retried to any additional servers.
An error is sent to the client (t + 10).
Server 3 remains available to receive requests.


[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Plug-in","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF012","label":"IBM i"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.5.5;8.5;8.0;7.0;6.1","Edition":"Advanced;Base;Developer;Enterprise;Express;Network Deployment","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
21 November 2023

UID

swg21450051