Your WBI Server Express system on iSeries becomes unresponsive intermittently - but resumes work at a later point in time without further manual interaction
Your WBI Server Express system seems to be hung
The work pool where your WBI Server Express system runs does not have enough memory. If more memory is required by the applications running in the work pool than what you had configured, the system will start swapping to disk. Now, if the WBI Server Express' Java™ Virtual Machine (JVM) starts the Garbage Collection (GC), all other threads of the already very slow JVM get paused until the garbage collection cycle is finished. This GC cycle can easily take 10 minutes or more to complete under extremely stressful and limited resource conditions. During this time, your WBI Server Express System looks unresponsive, e.g. System Manager, Flow Manager or Connectors may be able to start, but will look as if hung when connecting to WBI Server Express.
Diagnosing the problem
In order to determine if you suffer from the problem above, refer to the following technote first. Although it is written for WebSphere Application Server on iSeries, it actually applies for any JVM on iSeries. The JVM used for WBI Server Express is 1.4.2 Classic.
MustGather: Overview of Application Server, Node Agent, and Deployment Manager Hang Problems for IBM i (i5/OS)
Collect the following data as explained:
1) Information for at least on Garbage Collection (GC) cycle
2) Create several JVM Dumps
Once you have all this information, check the GC information first. How long did the last GC cycle(s) take?
Check your GC output for a line like this:
GC CYCLE=<integer> CYCLE DURATION=1824337 EMPTY OLD MARK SET DURATION=33 SYNC1 DURATION=240 SYNC1->SYNC2 DURATION=0
In this case, the GC cycle took 1824337 milliseconds, or, in other words, half an hour.
Resolving the problem
If the GC Cycle took longer than a second or two, you would have to tune the resources i.e. work pool where the WBI Server Express System runs in. Either increase CPU and/or memory manually to fixed values or set the parameter QPFRADJ to 2 or 3 for this particular pool. This setting will enable "auto performance adjustments" for this work pool. If your Garbage Collection Cycle looks fine (less than a second or two), some other thread(s) may be hung or running indefinitely, while most/all others are in state "wait". In this case, refer to the technote above again, collect the data for hung JVMs and follow the steps given in the technote to analyze the issue and/or get in touch with IBM support.
When opening a PMR, please supply the following information as well:
1) JVM Dumps you generated
2) GC trace information you generated
3) Operating System details (Version, Fix Pack/PTFs etc)
4) Exact version information for WBI Server Express
5) Exact version information for WebSphereMQ and DB2 (check for errors/problems in these components!)
7) Repository (export ICL from your System Manager, add custom/3rd party JAVA libraries if necessary)
8) If possible: steps and data to reproduce the problem