MustGather: Performance, hang, or high CPU issues on Windows
Resolving The Problem
Before taking any other steps, please enable verbose garbage collection logging on the server instance if it is not already enabled. "Enabling verbose garbage collection (verbosegc) in WebSphere Application Server" http://www.ibm.com/support/docview.wss?rs=180&uid=swg21114927 A restart of the server is required to enable the logging.
This method will not work while the server is starting or is in any condition that results in wsadmin process not connecting to the server. If you are in a clustered environment the server must show as active in the admin console .
wsadmin -lang jacl -javaoption -Xmx256M -f <path to file>\windowsperf.jacl servername <nodename if needed>
Replace servername with the name of the server having high cpu or performance problems and add the nodename if needed. Normally, the servername will suffice but the nodename will be needed if the servername is repeated on different nodes. Please use the script on the server that has the performance problem and supply the correct WebSphere Application Server Id and password if security is enabled on the server. This can be supplied on the popup window.
The script will connect to the WebSphere Application Server and verify that it is running. It will then trigger and collect javacores, tasklist, netstat, and other performance information in the \temp\pmr\logs\<server name> directory. During execution, the script will update the screen with each iteration of the collection process. A success message should appear when the script completes.
Once the script completes - collect ALL the process logs for the server from the usual log file directory, all the logs from the \temp\pmr\logs\<servername> directory and the javacores (by default it will generate 12 javacores at a interval of 30 seconds) from the <WebSphere Profile root directory> and upload them to the PMR. Note: Each execution of the script overwrites the files in the \temp\pmr\logs\<servername> directory
Note: the number of iterations and the pause time between collections are variables defined in the first two lines of the script, The variable iters is the number of iteration and the ptime variable is the pause time.
The defaults are:
set iters 12 (generate 12 javacores)
set ptime 30 (at a interval of 30 seconds)
These defaults are most useful in finding performance issues, high cpu and hung threads for most environments but can be edited to provide more iterations or different pause times. This may be requested by IBM support in some cases.
If the windowperf.jacl tool fails or if support requests it - please perform the following steps to collect data manually
Complete the Collecting Data manually instructions during the problem to collect the information and then use the Exchanging Information Section to share the information with IBM support.
The verboseGC data is critical to analyze a performance problem. If you have not already done so, enable verboseGC and restart the server.
Important note: Step 2. below involves installation of the chosen CPU data collection tool. Make sure to read that step and complete the installation of the tool on the problem server before the problem.
At the time of the problem:
- Take the output of netstat command to get information about TCP/IP sockets:
netstat -oan > netstat_before.out
- If you are seeing high CPU usage: Start collecting the CPU data. In most of the cases, the TPROF For Windows tool gives a complete and granular CPU data so its our preferred tool. Please follow the steps given in TPROF For Windows tool, to start collecting the CPU data. Note: TPROF does not install and run on Windows 2012 so an alternative is required.
If it is not possible to use the preceding tool, then here are the other tools available to collect CPU data:
Perfmon (Windows XP / Windows 2003/Windows 2008/Window 2012/Windows 7/Windows 8)
Pslist - available in the Sysinternals package from Microsoft and preferred if TPROF not available
- Download the file windows_hang.py and copy the file to your <PROFILE_ROOT>\bin directory. If instead copied to <WAS_HOME>\bin, the default server, which may be the deployment manger (dmgr), will be accessed when wsadmin.bat is launched.
NOTE: This script works for WebSphere Application Server 6.1 and higher.
If you are looking for the older windows_hang.bat that works with older releases of WebSphere Application Server, see the FAQ section.
To launch the script to produce the default three javacores spaced 2 minutes apart, run this command:
wsadmin -lang jython -f windows_hang.py -j -s servername
Replacing servername with your server's name.
wsadmin -port PORT -lang jython -f windows_hang.py -j -s servername
If a specific hostname and SOAP port need to be connected to:
wsadmin -host HOST_NAME -port PORT -lang jython -f windows_hang.py -j -s SERVER_NAME
Where HOST_NAME is either localhost or a valid hostname or IP address
Where PORT is the defined SOAP port used by the application server or deployment manager that
is being connecting to.
This script cannot be used while the application server is starting up (i.e. before the "e-business" message is seen in the SystemOut.log). This is due to the requirement that an active SOAP connection has to be established through wsadmin.
Alternative steps include collecting raw core dumps using userdump.exe, or (on Windows Vista/2008 or later) opening the Task Manager, right-click on the java process, and selecting Create Dump File from the context menu. See the manual steps (and FAQ) in the Crash MustGather to properly configure full core dumps as well as how to process any raw core dumps.
Further information about the script:
- To adjust the quantity of javacores produced, use the option "-r".
- To adjust the time in-between each javacore produced, use the option "-i" and provide the number of seconds.
If you are trying to run the script for a set period of time, you will need to calculate -i and -r separately. If you wanted to run the script for 900 seconds, but wanted javacores generated every 3 minutes (180 seconds), you would need to divide 900 seconds into 180 seconds to determine the setting for -r (the iterations, or number of javacores to produce). In this case, it's 5, and the command would be:
wsadmin -host HOST_NAME -port SOAP_PORT -lang jython -f windows_hang.py -j -i 180 -r 5 -s SERVERNAME
All the arguments below are added after the -f windows_hang.py option. Any arguments added before -f are reserved for wsadmin.bat (such as -lang, -host, and/or -port).
4. Follow the steps given in TPROF For Windows tool (or the other tool you chose in step 2), to
stop collecting the CPU data.
5. Take the final output of netstat command to get information about TCP/IP sockets:
netstat -oan > netstat_after.out
Submitting required data:
Zip all the output and log files:
- netstat output (per #1 and #5 above)
- CPU data (per #2 and #4 above)
- All the generated javacores (per #3 above)
- Server logs from the server having problems (<PROFILE_ROOT>\logs\<MY_SERVER>\)
Send the results to IBM Support.
Frequently Asked Questions (FAQs):
- What is the impact of enabling verboseGC?
VerboseGC data is critical to diagnosing these issues. This can be enabled on production systems because it has a negligible impact on performance (< 2%).
- What are 'javacores' and where do I find them?
Javacores are snapshots of the Java™ Virtual Machine activity and are essential to troubleshooting these issues. These files will usually be found in the profile_root, else search the entire system for "*javacore*".
- How to check the SOAP port of the server ?
Check the value of SOAP_CONNECTOR_ADDRESS in serverindex.xml file present under <PROFILE_ROOT>\config\cells\cell_name\nodes\node_name
- If either script fails, can I still collect javacores manually via wsadmin?
Follow these manual steps to collect the javacores:
- From the command prompt, enter the command to get a wsadmin command prompt :
If security is enabled or the default SOAP ports have been changed, you will need to pass additional parameters to the batch file in order to get a wsadmin prompt. For example:
wsadmin.bat [-host host_name] [-port port_number] [-user username [-password password]]
Note: You can connect wsadmin to any of the server JVM in the cell. After running the wsadmin command it will display the server process for which it has attached to. Depending on the process that it has attached to, you can get thread dumps for various JVMs. If wsadmin is connected to deployment manager, then you can get thread dumps for any JVM in that cell. If it is attached to a node agent, then you can get thread dumps for any JVM in that Node. If it is attached to a server, then you can get thread dumps only for the server to which has connected to.
- Get a handle to the problem application server.
Note: The contents in brackets "[.....]", along with the brackets, is not optional. It must be entered to set the jvm object. Also, note that there is a space between the words "completeObjectName" and "type":
wsadmin> set jvm [$AdminControl completeObjectName type=JVM,process=problemServerName,*]
Where server1 is the name of the application server that does not respond (or is hung). If wsadmin is connected to a Deployment Manager and if the server names in the cell are not unique, then you can qualify the JVM with node attribute in addition to process:
- Generate multiple javacores by issuing the following command every 2 minutes for 3 iterations:
wsadmin> $AdminControl invoke $jvm dumpThreads
- From the command prompt, enter the command to get a wsadmin command prompt :
- Is there another way to gather the required data?
- How to analyze the Java thread dumps ?
Download the IBM Thread and Monitor Dump Analyzer for Java Technology.
ThreadAnalyzer is a technology preview tool that can analyze thread dumps from WebSphere Application Server. It is useful for identifying deadlocks, contention, bottlenecks, and to summarize the state of threads within WebSphere Application Server.
- Where is the old windows_hang.bat?
The old script is located here, although there are limitations with this script as you are required to run this against the individual application server. Running this script with wsadmin running through the dmgr might cause this to fail.
Download the attached script (windows_hang.bat) under <PROFILE_ROOT>\bin folder.
This script will be used to automatically generate 3 javacores with 2 minutes interval. Before running the script, check the following:
- Name of the problematic server(s)
- If admin security is enabled then get the username/password.Check which SOAP port is in use, as you will be required to enter it interactively when running the script
- Name of the problematic server(s)
- For each of the problematic server(s) open a command prompt and go to profile_root\bin. Enter the following command to start the script:
windows_hang.bat [problem servername]
The script will prompt for the admin security and the SOAP port. It will then generate 3 javacores, 2 minutes apart. Once done, you should see the following message and 3 javacores in the <profile_root> directory:
"MustGather>> Last javacore generation Successful. Script will now exit"
- How to change the default time interval for javacore generation in the older windows_hang.bat script?
Edit the TIME_SLEEP variable in the batch file. This variable accepts the time in milli seconds.
- What if I am using WebSphere Application Server 5.1? Where are the server logs?
For WebSphere Application Server 5.1 the server logs will be here:
If asked to do so:
The preceding data is used to troubleshoot most of these issues; however, in certain situations Support may need additional data. Only collect the following data if asked to do so by IBM Support.
Follow instructions in MustGather: Getting user.dmp when hangs/performance degradation prevents generating a javacore to produce a set of three user.dmp files taken at 2 minute intervals.
For a listing of all technotes, downloads, and educational materials specific to a hang or performance degradation, search the WebSphere Application Server support site.
How to enable verbosegc for WebSphere
IBM Thread and Monitor Dump Analyzer
Steps to getting support for WebSphere Application Server
Submitting information to IBM support
MustGather: Read first for WebSphere Application Server
Troubleshooting guide for WebSphere Application Server
Not getting javacores? Instructions to get user.dmp.
To diagnose or identify a problem, it is sometimes necessary to provide Technical Support with data and information from your system. In addition, Technical Support might also need to provide you with tools or utilities to be used in problem determination. You can submit files using one of following methods to help speed problem diagnosis:
Related informationRecording your screen to share with IBM Support
|WebSphere Application Server - Express||Hangs/performance degradation||Windows||7.0, 6.1, 6.0, 5.1|
|Runtimes for Java Technology||Java SDK||Windows|
More support for:
WebSphere Application Server
Component: Hangs/Performance Degradation
Software version: 6.1, 7.0, 8.0, 8.5, 8.5.5, 220.127.116.11
Operating system(s): Windows
Software edition: Base, Express, Network Deployment
Reference #: 1111364
Modified date: 19 April 2019