Troubleshooting Process Engine Farms in version 5.0
There are many things that you must correctly configure in a Process Engine (PE) 5.0 farm. This troubleshooting document helps you identify and resolve the most common issues by guiding you through a process to verify a PE 5.0 farm configuration.
Throughout this document, we reference example names and IP addresses of Process Engine servers and the Process Engine virtual server. The following table contains the example configuration that is used in the example commands and the example expected results:
|Server or Device||Description||IP Address|
|Load Balancer||The device itself||10.20.30.100|
|PEFarm||A Process Engine Server Virtual Host as defined in the Load Balancer||10.20.30.40|
|PEFarmTest||Another Process Engine Server Virtual Host as defined in the Load Balancer||10.20.30.50|
|PE1||A server running PE software||10.20.30.41|
|PE2||A server running PE software||10.20.30.42|
|PE3||A server running PE software||10.20.30.43|
|PE Virtual Server||Main Port||Naming Service Port|
Throughout this document, the commands and expected results are relative to the example IP addresses and example ports shown above. As you work through the steps in this document, you must adjust the IP addresses and possibly the ports in the various commands and expected results so as to be relevant to your own PE farm.
This technote is for PE farms that use a hardware load balancer. Software load balancers might have different configuration requirements.
This technote references IPv4 addresses, but all of the steps are applicable to an IPv6 networks as well.
Resolving the problem
Understanding Process Engine 5.0 Farms
Configuring a Process Engine farm in PE 5.0 is significantly different from configuring a Process Engine farm in PE 4.5.1.
This document is specific to Process Engine 5.0 farms. If you're trying to configure a PE 4.5.1 farm, please see this technote:
PE 5.0 includes an enhancement that lets you run multiple separate PE servers on a single operating system instance.
An "operating system instance" is a separate physical server, a separate Solaris zone, a separate AIX LPAR or WPAR, a separate Windows VM, or something similar.
On earlier PE releases, you could only run a single instance of PE on an operating system instance. Now, you can run multiple instances of PE on one operating system instance.
In PE 5.0, any PE instance is called a "PE Virtual Server" – even if you only run one instance of the PE on the operating system instance.
A PE Virtual Server is not necessarily tightly tied to farming. You can run multiple PE Virtual Servers on a single physical computer in an environment that is not a farm. In this case, each instance of the PE is a completely separate Process Engine.
The "PE Virtual Server" (that supports multiple PE instances on a single operating system instance) is a completely separate concept from the "Process Engine Server Virtual Host" that you define in the load balancer when configuring a farm.
On a single operating system instance, each PE Virtual Server must be configured to have its own, unique name. The PE Virtual Server name is passed into the Process Task Manager when you run it and that tells that instance of PTM what PE Virtual Server it will be managing.
In Process Task Manager, on the General tab of the Process Engine node, the Process Engine Virtual Server Name field tells you what "PE Virtual Server" is being managed.
Each PE Virtual Server on a particular operating system instance must be configured to use its own unique values for the Process Engine Main Port and for the Process Engine Naming Service Port. For example, if you configure three PE Virtual Servers on a single operating system instance, the first one could use 32776 and 32777, but the other two would need to be configured to use other ports. The second PE Virtual Server might use ports 32778 and 32779, and the third PE Virtual Server might use ports 32780 and 32781.
Similarly, each PE Virtual Server on a particular operating system instance must be configured to use its own, separate PE database.
In PE 5.0, if you're configuring a Process Engine farm, it is important to understand that
You cannot create multiple PE Virtual Servers on a single operating system instance and then put them into the same farm, even for test purposes.
As you're defining and configuring a PE farm, follow the documentation closely. Among other things, it will tell you to
- Install the Process Engine software on all operating system instances that will host members of the farm.
- Define and configure the virtual server on the first PE server (e.g., PE1).
- Match the peseed.jars on all PE Servers (described in the documentation and below).
- In Process Task Manager on that first PE Server, right-click -> "New" on the "Instance" node to create a new farm member. Do not go to the second PE Server and try to configure it as a member of the farm.
If there are problems adding a new instance to the farm, we can use the command line peinit utility on PE1 to configure the other members of the farm:
- copy the ../ProcessEngine/data/pesvr.PEVirtual1/vwserver.ini file from the first server (e.g., PE1) to the ../ProcessEngine/data/ directory
- run peinit:
../ProcessEngine/peinit PEVirtual1 -V ../ProcessEngine/data/vwserver.ini -l PE1+PE2+PE3
When you have properly configured a PE farm, you will see (in Process Task Manager) multiple servers listed under the "Instances" node for the PE Virtual Server.
The troubleshooting procedure comprises the following general steps that are described below:
Make sure that all of the peseed.jar files match
The documentation currently tells you to copy peseed.jar to all servers that host members of the farm. But, that important step is only mentioned in the install and configuration of Windows-based PE Servers. The peseed.jar file must be copied on Unix-based PE Servers as well.
The peseed.jar is created during the install of the PE software on a server. All servers that host PE Virtual Servers that are members of a farm must have identical peseed.jar files. It doesn't really matter which PE Server's peseed.jar you select to copy to the other PE Servers. But, whichever peseed.jar you choose to use, that exact same peseed.jar must be put on all of the servers.
On each of the servers that have the PE software installed, make sure that the peseed.jar is identical.
On a Unix-based server, the peseed.jar file might be here:
On a Windows-based server, peseed.jar might be here:
Make sure you have copied the exact same peseed.jar to all PE servers that are members of the same farm.
If you fail to copy the peseed.jar to all servers in the farm, the PE Virtual Servers in a farm will not be able to properly communicate with each other, and the PTM on the one of the servers will not know the proper state of the other members of the farm. And you might see errors such as the following:
Configuring the load balancer's "Process Engine Server Virtual Host"
We will describe an example configuration so that you can see how a farm should be configured.
In the example farm configuration:
This lets us configure two separate farms of three PE Virtual Servers each:
In this configuration, if an operating system instance (e.g., PE1) goes down, that would take down the two PE Virtual Servers on PE1, leaving each of the farms with two remaining PE Virtual Servers (on PE2 and PE3).
The load balancer's Process Engine Server Virtual Host name for the first PE farm is PEFarm.
PEFarm is defined to have PE Servers on three hosts: PE1, PE2, and PE3. PEFarm is configured to use ports 32776 and 32777. Therefore, references to PEFarm are sent to PE1, PE2, or PE3 and use 32776 and 32777. That will access the VirtualPE1 servers on PE1, PE2, and PE3 because VirtualPE1 is configured to use those same ports (32776 and 32777).
The name of the load balancer's Process Engine Server Virtual Host (PEFarm in the example) does not need to match the "PE Virtual Server" name as defined in Process Task Manager (VirtualPE1 in the example).
The definition of the Process Engine Server Virtual Host in the load balancer specifies the ports and host names of the PE Virtual Servers that are members of the farms. The host names and ports (as configured in the load balancer's Process Engine Server Virtual Host) will get users of PEFarm to the correct PE Virtual Servers on one of the correct operating system instances.
The load balancer's Process Engine Server Virtual Host name for the second PE farm is PEFarmTest.
PEFarmTest is defined to have PE Virtual Servers on the same three hosts as PEFarm. But because PEFarmTest is configured to use ports 32778 and 32779, references to PEFarmTest will go to VirtualPE2 on those three hosts.
The following screens show what you should see in the Process Task Manager on the "General" tab of the "Process Engine" node on those three servers:
Running vwtaskman VirtualPE1 on any of PE1, PE2, or PE3 shows us this:
VirtualPE1 is defined on all three servers. It is configured to use ports 32776 and 32777. The load balancer and connection points are configured to use the name PEFarm to get to this farm of PE Virtual Servers.
Running vwtaskman VirtualPE2 on any of PE1, PE2, or PE3 shows us this:
VirtualPE2 is defined on all three servers. It is configured to use ports 32778 and 32779. The load balancer and connection points are configured to use the name PEFarmTest to get to this farm of PE Virtual Servers.
If you run Process Task Manager for VirtualPE1 on PE1, PE2, or PE3, you are able to see and control the current state of all three instances of VirtualPE1 - the PE Virtual Servers that are members of PEFarm.
Applications that need to talk to PEFarm would use a connection point that sends requests to the load balancer's virtual server named PEFarm. In the load balancer, PEFarm, would be configured to use ports 32776 and 32777.
The load balancer would then forward the requests to its choice of PE1, PE2, or PE3, and would forward the requests on ports 32776 and 32777.
This would result in the work being processed on PE1, PE2, or PE3, by a PE Virtual Server that is listening on 32776 and 32777. That PE Virtual Server would be VirtualPE1 (on one of those hosts).
Check hosts files on all of the PE Servers
For PE 5.0 farms, you don't need anything special in the hosts files.
The Process Engine no longer requires that the hosts files of each PE Server "alias" the load balancer's Process Engine Server Virtual Host name to the IP of the local PE Server, as it did on PE 4.5.1 farms. If that still exists in your hosts files, remove it.
For example, if you see a line in the hosts file of PE1 that looks like the following:
10.20.30.41 PE1 PEFarm
Remove the PEFarm reference so that the entry now looks like the following:
If your DNS properly resolves PE1 as 10.20.30.41, you could completely eliminate that entry in the hosts file.
The hosts files for PE2 and PE3 might need similar changes.
Verify vworbbroker.endPoint settings
PE 5.0 farms do not use the vworbbroker.endPoint settings. If this farm is an upgrade from a PE 4.5.1 farm, you might have old settings in Process Task Manager that should be deleted. If there are vworbbroker.endPoint settings on the Process Task Manager "Advanced" tabs, you should delete them.
Check the Connection Point configuration
In the Content Engine's Enterprise Manager, where connection points are defined, verify that all connection points for all regions specify the load balancer's Process Engine Server Virtual Host name as the DNS name. Connection points in a farmed environment should not directly reference a specific PE server. In the example configuration, connection points should reference PEFarm (or PEFarmTest) and not PE1, PE2, or PE3.
Check the Load Balancer's Process Engine Server Virtual Host PE Server Pool
The load balancer will have a pool defined for the process engines.
|Load Balancing Method||Round Robin|
|Members of the Pool||PE1 with Service Port 0|
|PE2 with Service Port 0|
|PE3 with Service Port 0|
Alternatively, you can define two pools, one for port 32776 and one for port 32777 and put them both in the same PEFarm configuration. In this case, you would see the following:
|Load Balancing Method||Round Robin|
|Members of the Pool||PE1 with Service Port 32776|
|PE2 with Service Port 32776|
|PE3 with Service Port 32776|
|Load Balancing Method||Round Robin|
|Members of the Pool||PE1 with Service Port 32777|
|PE2 with Service Port 32777|
|PE3 with Service Port 32777|
Then, create your virtual Process Engine Server Virtual Host (i.e., PEFarm in our example configuration) to include either
- PE_Pool or
- PE_Pool_32776 and PE_Pool_32777.
Do not try to combine PE1:32776 and PE1:32777 in a single pool.
Check the Load Balancer's Port Forwarding Configuration
In the load balancer, make sure that the Process Engine Naming Service Port (32776 by default) is set to forward http traffic. In our example configuration, we have two virtual servers, so we have 32776 and 32778 as our naming ports, so we would configure them both to forward http traffic.
In the load balancer, make sure that the Process Engine Main Port (32777 by default) is set to forward tcp traffic. In our example configuration, we have two virtual servers, so we have 32777 and 32779 as our main ports, so we would configure them both to forward tcp traffic.
Check the Load Balancer's Process Engine Server Virtual Host Health Monitors
Verify that the following two health monitors are defined:
- Monitor 32776 with an HTTP monitor for "IOR/ping" (interval 30 seconds, timeout 91 seconds).
Note: this timeout setting differs from the older recommendation in the P8 High Availability Redbook. On this health check, you should configure the load balancer to check that the returned HTTP response includes "Process Engine"
For troubleshooting or to simplify the configuration, you can eliminate the 32776 monitor and just monitor 32777.
Make sure the monitor types are correct. Be sure to use TCP for port 32777. And be sure to use HTTP for the IOR/ping to port 32776.
Incorrectly configured health checks can cause a load balancer to take a Process Engine instance out of the farm even though that Process Engine is functioning correctly.
Test network connectivity and Domain Name Resolution
Servers must be able to see each other and resolve each other's names. And the servers must be able to resolve the load balancer's Process Engine Server Virtual Host name.
In all of the cases below, if there are any connectivity or name resolution issues, those problems need to be resolved.
Usually, DNS will recognize and resolve the short, unqualified names of the servers (e.g., PE1 and PEFarm). But you might find that you need to use the fully qualified names (e.g., PE1.ibm.com and PEFarm.ibm.com) in pings, nslookups and in various configuration settings. To get the names to properly resolve, you can use any of the following remedies:
- Adjust the hosts files on all of the servers so that both the short and fully qualified names are resolved to their correct IP addresses. For example, the hosts file on PE1 might include entries such as these:
10.30.30.40 PEFarm PEFarm.ibm.com
In the commands below, we reference the short, unqualified names of the servers (e.g., PE1 and PEFarm).
On the AE and CE servers, PEFarm should resolve properly. If DNS isn't resolving PEFarm, you should configure DNS to resolve PEFarm. Or, you could consider adjusting the hosts files on the AE and CE servers to include a line that causes the PEFarm name to resolve to the IP of the Process Engine Server Virtual Host (as defined in the load balancer). For example, using our example configuration, the hosts files on the AE and CE servers could be modified to contain a line similar to the following:
10.20.30.40 PEFarm PEFarm.ibm.com
Verify that PE services are functional from each PE server
From each PE server, in a web browser, hit this URL: http://localhost:32776/IOR/ping
An expected response from doing this on PE1 is similar to the following:
Note that we hit port 32776 and that port is associated with the PE Virtual Server named VirtualPE1 (and with the Process Engine Server Virtual Host named PEFarm). In this case, the PE Server at 10.20.30.41 responded, because that's the "localhost" where the web browser was running when it hit that page.
Run a web browser on PE2 and on PE3 and hit the same "localhost" URL. When you do this on PE2 and PE3, you should see the IP listed in the "Local Host" row of the response changing to reflect the current PE Server where the IOR/ping is being run.
Then, from each PE sever, hit this URL: http://localhost:32776/IOR/FileNet.PE.vworbbroker
The IOR/FileNet.PE.vworbbroker will return a simple page that contains "IOR:" followed by a long string of hex digits. An expected response from doing this on PE1 would be similar to the following:
Repeat this process on PE2 and PE3, verifying that an "IOR string" is returned.
This shows that the individual PE Servers are responding.
If you don't have a web browser on the PE servers, you can use the ConfigUtils program. In a command prompt window on the PE server, you can retrieve the HTML page from an HTTP server and display the resulting HTML page by using a command similar to the following:
For a Unix-based PE Server:
java -classpath /opt/IBM/FileNet/ProcessEngine/lib/pe.jar:/opt/IBM/FileNet/ProcessEngine/lib/pe3pt.jar filenet.vw.server.ConfigUtils /url http://localhost:32776/IOR/ping
For a Windows-based PE Server:
java -classpath "D:\Program Files\IBM\FileNet\ProcessEngine\lib\pe.jar";"D:\Program Files\IBM\FileNet\ProcessEngine\lib\pe3pt.jar" filenet.vw.server.ConfigUtils /url http://localhost:32776/IOR/ping
You might need to adjust the exact locations of Java and of the jar files in the classpath. Note that the ConfigUtils java application does not render the HTML page. It displays the HTML text exactly as retrieved.
Verify HTTP connections from the Application Engine and Content Engine
From the AE server(s) and from the CE server(s), in a web browser, access the following URLs:
Note that this time, you are referencing the PEFarm - the load balancer's "Process Engine Server Virtual Host" rather than localhost or a specific PE server.
The http://PEFarm:32776/IOR/FileNet.PE.vworbbroker should return a page similar to the following:
The http://PEFarm:32776/IOR/ping should return a page similar to the following:
This shows that the PE farm is responding to calls from the AE and from the CE and that the PE1 server responded to that particular ping.
Refresh the browser several times using the IOR/ping URL and the response should be seen to come from various servers in the PEFarm pool. Check the "Local Host" IP address in the response after each refresh. When a refresh reports a different IP address, that demonstrates that load balancing is occurring and that the response came from a different member of the PE Farm.
If there is no web browser on the AE or CE server, you can use the ConfigUtils program on the AE or CE as described above.
Verify the Connections to the PE Servers
This test uses the IOR string that is returned via the "Naming Service Port" (e.g., 32776) to get the info on the PE server's configured "Main Port" (e.g., 32777). It then accesses that port with a "ping" RPC.
From each PE server, run the ORBServiceHelper program and verify that a good response is returned:
For a Unix-based PE Server, run this command:
java -classpath /opt/IBM/FileNet/ProcessEngine/lib/pe.jar:/opt/IBM/FileNet/ProcessEngine/lib/pe3pt.jar:/opt/IBM/FileNet/ProcessEngine/CE_API/lib/Jace.jar filenet.pe.peorb.client.ORBServiceHelper /rpc ping /host localhost /port 32776
For a Windows-based PE Server, run this command:
java -classpath "D:\Program Files\IBM\FileNet\ProcessEngine\lib\pe.jar";"D:\Program Files\IBM\FileNet\ProcessEngine\lib\pe3pt.jar";"D:\Program Files\IBM\FileNet\ProcessEngine\CE_API\lib\jace.jar" filenet.pe.peorb.client.ORBServiceHelper /rpc ping /host localhost /port 32776
An expected response should look something like the following:
Pinged localhost:32776. Server response = [1320182274264
Aloha from JPE ORBBROKER RPC :From TEST Client
In all cases, you might need to add the full path to the java program, you might need to adjust the paths to the jars in the classpath, and you might need to adjust the port value from 32776 (use the Process Engine Naming Service Port for the PE Virtual Server you are checking).
Verify the connections from the AE Server to the PE Servers via the Process Engine Server Virtual Host
Run the same ORBServiceHelper program on the AE Server and but have it access PEFarm rather than a specific PE server.
For a Unix-based AE Server, run this command:
java -classpath pe.jar:pe3pt.jar:Jace.jar filenet.pe.peorb.client.ORBServiceHelper /rpc ping /host PEFarm /port 32776
For a Windows-based PE Server, run this command:
java -classpath pe.jar;pe3pt.jar;jace.jar filenet.pe.peorb.client.ORBServiceHelper /rpc ping /host PEFarm /port 32776
On the AE Server, you will need to find the full paths to the three required jars and adjust the classpath accordingly.
Verify that the same "Aloha..." response is generated.
Check Timeout configurations
Other software or devices might be monitoring ports, network sockets, and connections, and closing them after certain periods of inactivity have elapsed. For example, AppDirector (from Radware) can do this kind of thing. AppDirector can specify timeouts that can close sessions and this can then cause TCP resets to be issued from the Process Engine servers. This will result in errors such as:
ORB Timeout org.omg.CORBA.COMM_FAILURE: purge_calls:1451 reason=1 state=5 vmcid: IBM minor code: 306 completed: Maybe
This is a complex area, with various timeouts and keepalive settings. Here, we can only provide some high-level advice.
For example, we have seen a case where a load balancer was closing connections unexpectedly and the fix was to set the WebSphere "keep alive" settings to 20 half-seconds:
tcp_keepidle = 20
tcp_keepinit = 20
tcp_keepintvl = 20
These settings cause a "keep alive" packet to be sent every 10 seconds (every 20 "half-seconds"), preventing the load balancer from dropping the connections.
Check Time-To-Live setting if using Dynamic DNS
If you access your Process Engine farms through a load balancer configured to use a dynamic DNS server you must configure the Time-To-Live (TTL) setting for your environment. Follow the instructions in "Configure the Time-To-Live value for load balancers that use dynamic DNS servers" in the IBM High Availability Solution for IBM FileNet P8 Systems Redbook.
Verify required networking configuration
Read through the P8 5.0 Performance and Tuning Guide and make sure the CORBA FragmentSize parameter to set to 0 and that the JVM ConnectionMultiplicity and Threadpool settings are configured appropriately.
Make sure all of the documented configuration tasks have been done as described in the Plan and Prepare Your Environment for FileNet P8 guide:
Many platform-specific network configuration settings are discussed in that document, including making sure NetBIOS over TCP/IP is enabled on Windows servers and Time-To-Live is set correctly.
The Process Engine is stateless. There is no requirement to enable "sticky sessions" or "session affinity" in a load balancer for the Process Engine 5.0 farm ports (e.g., 32776 and/or 32777). Note that the Application Engine does require sticky sessions as described in this P8 5.0 documentation reference.
Information to gather and submit if problems still exist
- Load Balancer configuration info, including
- health checks.
- health checks.
Include a description of symptom or problem that you're seeing, and any other pertinent info you have about your environment.
Depending on the symptoms of the problem, it might be advisable to get some traces from the Process Engine server. In vwtool, you can enable the FARM tracing, the ORB tracing, and the IOR tracing. For example:
Performed on all servers/local server/ or vwtool (CR=a, l, v):
PE API ORB pool minimum = 1, maximum = 10
Enter new save options (^C to exit): y
Save via log file? (y/CR=n): y
Trace files will be saved in directory: D:\Program Files\IBM\FileNet\ProcessEngine\data\pesvr.default\logs\
Extract new options from the traceOptions file ? (y/CR=n):
Enter new content options:
Trace external RPCs? (y/CR=n):
Trace Object Service RPCs? (y/CR=n):
Trace database access? (y/CR=n):
Trace database timings? (y/CR=n):
Trace database transaction? (y/CR=n):
Trace Instruction Sheet Interpreter? (y/CR=n):
Trace Log Manager? (y/CR=n):
Trace email notification? (y/CR=n):
Trace SEC calls? (y/CR=n):
Trace Workflow termination? (y/CR=n):
Trace Transfer? (y/CR=n):
Trace Rules? (y/CR=n):
Trace Web Services? (y/CR=n):
Trace envcache access? (y/CR=n):
Trace exceptions? (y/CR=n):
Trace Farming? (y/CR=n): y
Trace Stored Procedure Calls? (y/CR=n):
Trace Expression Parsing? (y/CR=n):
Trace RDB Objects? (y/CR=n):
Trace RDB Time? (y/CR=n):
Trace Application Space? (y/CR=n):
Trace ORB? (y/CR=n): y
Trace ORB Input? (y/CR=n): y
Trace ORB Output? (y/CR=n): y
Trace database outputs? (y/CR=n):
Note that this tracing can generate a lot of data very quickly, so you would generally enable those traces, reproduce whatever problem you're seeing, and then disable the traces.
If you need to run the traces for a longer period of time, you should probably turn off the ORB Input and ORB Output traces.
Run the Process Engine 5.0 ISA Lite Data Collector, described here:
Advanced troubleshooting can involve gathering network traces. This might involve using a program such as wireshark to verify that network traffic is getting properly routed to and from the various servers and load balancers as expected. Depending on the problem, this kind of network tracing might be required.
More support for:
Software version: 5.0, 5.1.0
Operating system(s): AIX, HP-UX, Linux, Solaris, Windows
Reference #: 1570520
Modified date: 18 November 2016