MustGather: WebSphere DataPower SOA Appliance statistics gathering
This document applies only to the following language version(s):
A device is reporting high load, high CPU, high resource utilization, or slow responsiveness. What data should I collect as part of the MustGather process?
A device may encounter high load, CPU, etc. from problems such as recursive style sheets, configurations, heavy use of system variables or extensive debugging. This type of behavior can be analyzed to determine the cause of the problem. A few initial steps and questions:
- Has the device's FFDC feature been configured or enabled? See for the Best Practices: Most Detailed Error Report - assure error reports are generated Always on Startup.
- Is this occurring on a single device?
- Is the problem behavior something I can replicate?
- Is the probe enabled? Is debug logging enabled?
- Enable Throttle Logging.
The main purpose is to determine if this is a one-time event or a reproducible problem. If the problem can be reproduced, DataPower Support has a significant chance of locating the source of the problem.
If the problem has occurred once on a single device:
- If the issue is still affecting the device, collect multiple instances of the Command Line Interface (CLI) output from the bash script available at the bottom of the page.
- Obtain all logs stored inside the logtemp:// directory.
- If the issue is related to high memory, remove the device from network activity and let it remain idle for at least 15 minutes. Monitor the memory consumption to determine if it returns to a lower value.
- If possible do not reboot. Instead, leave the device in the bad state and contact IBM DataPower Support.
If you can reproduce the problem, follow these instructions:
Using the CLI, the following tools can be used by DataPower Support to better understand the problem. The following output collection will begin to isolate the problem but in some cases, the problem may need to be simplified and reproduced further.
Methods to simplify the problem can be:
- Reduce the number of active services and/or domains running when the problem occurs.
- If the behavior is a sudden spike, then the traffic hitting the device at the time of the spike would be the best thing to provide. A packet trace to capture the inbound traffic would be ideal.
- There will be times when IBM Support will need to reproduce the problem. Sample client traffic and backend content will need to be simulated. A packet trace can help replicate this.
Keep in mind that the diagnostic outputs are for use by IBM Support only and not publicly documented.
- Have the following commands ran before, during and after the problem is recreated (the more iterations over time the better). The 'admin' user is required to collect this data as it is the only user with access to the diagnostics (diag) prompt.
show memory details
show activity 100
- Including a minimum of 5 CLI outputs is ideal. The periods over which you must collect will vary depending on the ability to recreate the issue. The default is 5 minute segments, but if the issue happens in shorter periods it would be advised to collect the CLI outputs more frequently. At a minimum in production do not collect CLI data faster than 30 seconds.
- In production environments if there is concern of an impact, omit the 'show tcp-table', 'show gateway-transactions' and 'show handles' commands.
- A debug log from the default domain and domain where the active service is running should be collected. If it is unclear which other domain the debug log should be collected from, the default domain alone will be a good start.
- Collect a device backup. With this, DataPower Support will have all domains including the default to work from as needed.
When complete you should have the following files:
- CLI output collection
- Device backup
- A packet traces of client or backend traffic going through the device
- A sample client input message(s) that triggers the condition
- Debug logs from the default domain and any other domains that span the same time as the CLI output capture
To help capture the CLI outputs from a device, the following shell script may help. Review the script as it is commented and designed to collect the outputs at a timed interval into a unique filename + timestamp format. The script will need to be modified to work with your device IP address and login information.
Keep in mind this is only a sample to assist in this collection and is not a supported script. The 'admin' user is required for this as we need to access the 'diag' (diagnostics) prompt. If this script is run with the admin id using an incorrect password you could lock yourself out of the device. The "COUNT" and "sleep" values should be adjusted depending on problem behavior.
These diagnostic commands are intrusive and necessary. To prevent any known issues from causing additional complications or problems, it is highly recommended that you be running the latest firmware. To confirm your firmware will not cause a problem during debugging, always check the release notes for your firmware level accessible from this document.
More support for:
IBM DataPower Gateways
Software version: 7.0.0, 7.1, 7.2, 7.5, 7.5.1, 7.5.2, 7.6
Operating system(s): Firmware
Software edition: Edition Independent
Reference #: 1377610
Modified date: 15 December 2010