IBM Support

MustGather: Performance, hang, or high CPU issues on Windows

Troubleshooting


Problem

If you are experiencing performance degradation, hang, no response, hung threads, CPU starvation, high CPU utilization, network delays, or deadlocks, this MustGather assists you in collecting the critical data that is needed to troubleshoot your issue.

Resolving The Problem

Before taking any other steps, please enable verboseGC as it is critical to analyze a performance data.
 

Preferred Method

This method will not work while the server is starting or is in any condition that results in wsadmin process not connecting to the server. If you are in a clustered environment the server must show as active in the admin console. Download and unpack the windowsperf.zip file from the below link.  The windowsperf python file is compatible with WebSphere Application Server version 8.5.5.x or higher.

windowsperf.zip

To invoke the script, use the following command from the bin directory of the profile running the WebSphere Application Server.
wsadmin -lang jython -f <path to file>\windowsperf.py server_name1 server_name2
To include node(s), quote the node name with a colon and separate server names with a space. To add another node, simply add a space between the quoted node names.
wsadmin -lang jython -f <path to file>\windowsperf.py "node_name1:server_name1 server_name2" "node_name2:server_name3"

Replace server_name with the name of the server having high CPU or performance problems and add the node_name if needed. Normally, the server_name will suffice but the node_name will be needed if the server_name is repeated on different nodes. Please use the script on the server that has the performance problem and supply the correct WebSphere Application Server ID and password if security is enabled on the server. This can be supplied on the pop-up window.

The following default values can be edited to provide more iterations or different pause times or enabling OS data collection but these defaults are most useful in finding performance issues, high CPU and hung threads for most environments. 

ITERATIONS = 8
DELAY = 30
COLLECTING_OS_DATA = 0


The script will connect to the WebSphere Application Server and verify that it is running. It will then produce javacores, and if COLLECTING_OS_DATA set to 1, tasklist, netstat, and other performance information generate in the C:\temp\windowsperf.RESULTS.<date>.<time> directory that then will be compressed into a zip file with the same directory name. During execution, the script will update the screen with each iteration of the collection process. A success message should appear when the script completes.

Once the script completes,  zip and upload all the following data for the problematic server(s):
  • C:\temp\windowsperf.RESULTS.<date>.<time>.zip (if COLLECTING_OS_DATA set to 1)
  • WebSphere Application Server logs(systemout.log, native_stderr.log, etc)
  • javacores
  • server.xml (profile_root\config\cells\cell_name\nodes\node_name\servers\server_name\server.xml)


If the windowperf.py tool fails or if support requests it - please proceed to the Collecting data manually section below.

Collecting data manually

Important note: Step 2. Below involves installation of the chosen CPU data collection tool. Make sure to read that step and complete the installation of the tool on the problem server before the problem.

At the time of the problem:

1. Take the output of netstat command to get information about TCP/IP sockets:
netstat -oan > netstat_before.out
2. If you are seeing high CPU usage: Start collecting the CPU data. In most of the cases, the TPROF For Windows tool gives a complete and granular CPU data so it is our preferred tool. Please follow the steps given in TPROF For Windows tool, to start collecting the CPU data. Note: TPROF does not install and run on Windows 2012 so an alternative is required.
3. Download the file windows_hang.py and copy the file to your <PROFILE_ROOT>\bin directory. If instead copied to <WAS_HOME>\bin, the default server, which can be the deployment manager, will be accessed when wsadmin.bat is launched.
NOTE: This script works for WebSphere Application Server 6.1 and higher.
windows_hang.pywindows_hang.py
To launch the script to produce the default three javacores spaced 2 minutes apart, run the following command, replacing server_name with your server's name:
    wsadmin -lang jython -f windows_hang.py -j -s server_name
If a specific SOAP port needs to be connected to:
    wsadmin -port PORT -lang jython -f windows_hang.py -j -s server_name
If a specific hostname and SOAP port need to be connected to:
    wsadmin -host HOST_NAME -port PORT -lang jython -f windows_hang.py -j -s SERVER_NAME
HOST_NAME is either localhost or a valid hostname or IP address.
PORT is the defined SOAP port used by the application server or deployment manager that is being connecting to.
This script cannot be used while the application server is starting up (i.e. before the "e-business" message is seen in the SystemOut.log). This is due to the requirement that an active SOAP connection has to be established through wsadmin.
Further information about the script:
  • To adjust the quantity of javacores produced, use the option "-r".
  • To adjust the time in-between each javacore produced, use the option "-i" and provide the number of seconds.
    If you are trying to run the script for a set period of time, you will need to calculate -i and -r separately. If you wanted to run the script for 900 seconds, but wanted javacores generated every 3 minutes (180 seconds), you would need to divide 900 seconds into 180 seconds to determine the setting for -r (the iterations, or number of javacores to produce). In this case, it's 5, and the command would be:
        wsadmin -host HOST_NAME -port SOAP_PORT -lang jython -f windows_hang.py -j -i 180 -r 5 -s server_name
All the arguments below are added after the -f windows_hang.py option. Any arguments added before -f are reserved for wsadmin.bat (such as -lang, -host or -port, or both).
Arguments
Default Value
Description
Required
--serverName
-s
The problematic application server name. This is not the same as the profile name or the host name of the physical machine.
Case-sensitive
YES
--nodeName
-n
The problematic application server's node.
This is not the same as the profile name or the host name of the physical machine.
Case-sensitive
Optional; use if multiple nodes are defined or running the script against the deployment manager.
--javacore
-j
disabled Enables the generation of multiple javacores YES, if you want to capture javacores.
--interval
-i
120 (seconds) Javacore generations are spaced apart the number of seconds defined here. No
--iterations
-r
3 This defines the quantity of javacores to produce. No
--heapdump
-d
disabled Enables the generation of a single heapdump No
--multiple
-m
disabled Enables the generation of multiple heapdumps. Use -i to control the quantity. No
--help Displays a help page. Note the two dashes. No
4. Follow the steps given in TPROF For Windows tool (or the other tool you chose in step 2), to stop collecting the CPU data.
5. Take the final output of netstat command to get information about TCP/IP sockets:
    netstat -oan > netstat_after.out
6. Zip and upload the following files to IBM Support:
  • netstat output (per #1 and #5 above)
  • CPU data (per #2 and #4 above)
  • All the generated javacores (per #3 above)
  • Server logs from the server having problems (<PROFILE_ROOT>\logs\<MY_SERVER>\)

Frequently asked questions (FAQs):

What is the impact of enabling verboseGC

VerboseGC data is critical to diagnosing these issues. This can be enabled on production systems because it has a negligible impact on performance (< 2%).

What are 'javacores' and where do I find them?

Javacores are snapshots of the Java™ Virtual Machine activity and are essential to troubleshooting these issues. These files will usually be found in the profile_root, else search the entire system for "*javacore*".

How to check the SOAP port of the server?

Check the value of SOAP_CONNECTOR_ADDRESS in serverindex.xml file present under \config\cells\cell_name\nodes\node_name

If either script fails, can I still collect javacores manually via wsadmin or admin console?

Yes, there are alternative ways to manually collecting javacores. See the Generating Javacores and Userdumps Manually For Performance, Hang or High CPU Issues on Windows page.

How to analyze the Java thread dumps?

Download the IBM Thread and Monitor Dump Analyzer for Java Technology.

ThreadAnalyzer is a technology preview tool that can analyze thread dumps from WebSphere Application Server. It is useful for identifying deadlocks, contention, bottlenecks, and to summarize the state of threads within WebSphere Application Server.

Where is the windowsperf.jacl script?

This old script works for WebSphere Application Server 6.1 or higher.

windowsperf.jacl

wsadmin -lang jacl -javaoption -Xmx256M -f <path to file>\windowsperf.jacl server_name <node_name if needed>

Exchanging data with IBM Support

To diagnose or identify a problem, it is sometimes necessary to provide Technical Support with data and information from your system. In addition, Technical Support might also need to provide you with tools or utilities to be used in problem determination. You can submit files using one of following methods to help speed problem diagnosis:

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000CdBVAA0","label":"WebSphere Application Server traditional-All Platforms-\u003EHang Performance CPU"}],"ARM Case Number":"","Platform":[{"code":"PF033","label":"Windows"}],"Version":"8.5.5;9.0.0;9.0.5"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS7JFU","label":"WebSphere Application Server - Express"},"ARM Category":[{"code":"a8m50000000CdBVAA0","label":"WebSphere Application Server traditional-All Platforms-\u003EHang Performance CPU"}],"Platform":[{"code":"PF033","label":"Windows"}],"Version":"8.5.5"},{"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Java SDK","Platform":[{"code":"PF033","label":"Windows"}],"Version":"","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
16 June 2023

UID

swg21111364