Best practices for starting and stopping WebSphere Process Server (WPS) Version 7.0 and later

Question & Answer

Question

What are some of the best practices for operational procedures in starting and stopping WebSphere Process Server?

Answer

The operational procedures for a WebSphere Process Server environment depend on the complexity and nature of the environment topology and the various connected components.

This document is a sample guide that you can use to develop your own set of operational procedures to suit your environment. It contains best practices and procedures that apply to the typical WebSphere Process Server deployment environment. However, it is only intended as a starting point to develop procedures customized to your specific environment. It is not intended to be used as-is. It is important to thoroughly test the operational procedures before using them in production.

Additional documentation on Recovery Procedures is available in the IBM WebSphere Process Server Best Practices in Error Prevention Strategies and Solution Recovery Redbooks publication.

The general practice is to stop the inbound traffic to WebSphere Process Server prior to starting or stopping the server. This approach produces the least complexity during these operational procedures and reduces the chance of potential errors.

This document is split into the following sections:

Steps to follow for starting the server in a normal mode
Steps to follow for shutting down the servers gracefully
Steps to follow if servers are shutdown immediately or abruptly in error

Steps to follow for starting the server in a normal mode

Stop the inbound traffic to WebSphere Process Server.

If a front-end HTTP server is used in the WebSphere Process Server solution, then stop the flow of incoming HTTP requests to WebSphere Process Server. This approach allows the server to complete its normal startup without a pending workload. It covers the JAX-WS, JAX-RPC with SOAP/HTTP, and HTTP bindings. With IBM HTTP server, for example, use the ability to modify the plugin configuration. For more information, see the Configuring a temporary 'Site Down For Maintenance' page in IBM HTTP Server document.

Note: IBM HTTP server controls just HTTP traffic. Other protocols might still produce work for the server.
Note: If HTTP traffic is already stopped during a graceful shutdown of servers then skip step and move to step 1b.
For JMS, MQ, MQJMS, web services with SOAP/JMS export bindings, refer to your operational procedures on how to stop the applications that generate the request messages into queues.

Start the servers in a normal mode in the following order: deployment manager, node agents, messaging cluster, support cluster, and application target cluster.

For messaging servers, verify that messaging engines are started before starting other servers.
For all servers, check the SystemOut.log file to verify that it does not contain any errors and is "Open for e-business" before starting the next servers.

Start the inbound traffic to WebSphere Process Server.

Enable the flow of incoming HTTP requests to servers that were stopped in step 1a.
Enable the inbound flow of messages for JMS, MQ, MQJMS, and JAX-RPS with SOAP/JMS export bindings if they were stopped in step 1b.

Steps to follow for shutting down the servers gracefully

Stop the inbound traffic to WebSphere Process Server.

If a front-end HTTP server is used, stop the flow of incoming HTTP traffic flow to WebSphere Process Server. See step 1a in the Steps to following for starting the server in a normal mode section for details. This approach stops new work from starting and allows the server to complete in-progress work. HTTP traffic is synchronous so it might take a long time to complete. This step allows you to end incoming flows earlier and reduce the shut down time.
For WebSphere MQ, MQJMS, JMS, and web services with SOAP/JMS bindings, pause the message processing by deactivating the J2CMessageEndpoints. Use the following sample.jacl script as a sample and modify it to suit your environment:

sample.jacl set myCellName "MyCell" set myNodeName "MyNode" set myServerName "MyClusterMember" set J2CMessageEndpoint [$AdminControl queryNames cell=$myCellName,node=$myNodeName,type=J2CMessageEndpoint,process=$myServerName,* ] puts "Check status of J2CMessageEndpoints. 1 = activated, 2 = deactivated, 3 = stopped" puts "" foreach endpoint $J2CMessageEndpoint { puts [$AdminControl invoke $endpoint getActivationProperties] puts "" puts "Before pause, Status = [$AdminControl invoke $endpoint getStatus]" puts "" puts [$AdminControl invoke $endpoint pause] puts "After pause, Status = [$AdminControl invoke $endpoint getStatus]" puts "--------------------------------" puts "" }
Run the script from <profile name>/bin directory. Fore example, use the following command to run the script:
wsadmin.bat -f c:\sample.jacl -user admin -password admin
Note: This approach should also help to control the WebSphere J2C adapter polling thread.

Also note that the IBM Jacl to Jython Conversion Assistant is available if you need a Jython version of the previous sample script.

Shut down all of the servers gracefully in the following order: application target Cluster, support cluster, messaging cluster, node agents, and the deployment manager.
Check the log files to verify that all of the servers are stopped cleanly.

If servers cannot be stopped gracefully then collect the data from the "Collect data manually" section of the MustGather: Application Server, dmgr, and nodeagent start and stop problems document. Also, see the list of MustGather documents and use the one that is most appropriate to your environment.

Steps to follow if Servers are shutdown immediately or abruptly in error

Ask the database administrator to check for and resolve any in-doubt transactions in the database instance.
Increase the Connection pool size for the WPSDB data source to 160 or higher. Start the deployment manager and the node agents in the normal node and then change the connection pool size thru the administration console. For information on why you need to set the connection pool size higher, see the Default behavior of managed connections in WebSphere Application Server developerWorks document.
Increase the Global transaction time out value to 300.

In the Administrative Console, click Servers > Server Types > WebSphere application servers > server_name > Container Services > Transaction Service and change the values for the Total transaction lifetime timeout and Maximum transaction timeout to 300. The server might need a higher transaction time in case if there is a large number of transactions to be recovered and to avoid rollbacks or transaction timeout failures.
Start all of the servers in the recovery mode in the following order: messaging cluster, support cluster, and the application target cluster. Use the following command:
<Profile name>/bin/startServer.(bat|sh) <serverName> -recovery

Note: The deployment manager is started normally in step 2 and it does not need to be restarted,
Wait for the recovery to complete and then reset the connection pool size and transaction time out values to their original values.
Start all of the servers in the normal mode in the following order: messaging cluster, support cluster, and application target cluster.

For messaging servers, verify that messaging engines are started before starting other servers.
For all servers, check the SystemOut.log file to verify that it does not contain any errors and it is "Open for e-business" before starting the next servers.

Use the administrative console to determine whether any in-doubt transactions exist. Click Servers > Application Servers > server_name > Container Services > Transaction service > Runtime tab. If in-doubt transactions are listed, select all of them and initiate a rollback.
Check whether there are messages in the Retention Queue and Hold Queue of the Business Process Container. Click Servers > Application Servers > server_name > Business process container > Runtime Configuration. If there are any messages on the Retention Queue, the Hold Queue, or both, replay the messages. However, replay all messages from the Retention Queue first.
Check the Failed Event Manager for any failed messages and take the appropriate action.

Related Information

Solutions Recovery Best Practices Redbook

[{"Product":{"code":"SSQH9M","label":"WebSphere Process Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"General","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"7.0.0.5;7.0.0.4;7.0.0.3;7.0.0.2;7.0.0.1;7.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Product Synonym

WPS

Was this topic helpful?

Document Information

Modified date:
15 June 2018

UID

swg21610342

Tips