No SFG files are getting routed to the Consumer. And noticed that all BPs fail with a status HALTED on a 2 node cluster

Technote (troubleshooting)


Problem(Abstract)

No SFG files are getting routed to the Consumer. And noticed that all BPs fail with a status HALTED on a 2 node cluster.

Symptom

The SFG routing was failing due to BP MailboxEvaluateAllAutomaticRulesSubMin

received the errors, Step 1 System_Service , Advance status
is 'General Engine Error'. The Status Report specified "caught error
putting wfc into Queue:null Error from Listner". Also the SI System log contained this same wfc error.
In addition, we had the customer run a canned test BP named Validation_Sample_BPML and it failed with the same errors, it turned out all BPs failed with the same error.

Cause

The SI cluster nodes were using the default UDP protocol for communication, which stopped working on the network level.

Environment

This is a 2 node cluster running AIX.

Diagnosing the problem

SI System logs contained errors "caught error putting wfc into Queue:null Error from Listener". Other logs and property files provided for Nodes 1 and 2 :

- jgroups_cluster.properties
- Noapp log and Noapp.dated log
- Sandbox.cfg
- WF. log
- customer_overrides.properties


Resolving the problem

Added new parms to the customer_overrides.properties files on both nodes which switched the cluster communication from using UDP to TCP.

Updated customer_overrides.properties file on Node 1 with the
following new info:

###Node1 -- customer_overrides.properties
jgroups_cluster.property_string=TCP(bind_addr=123.30.39.196;start_port=2
3061):TCPPING(initial_hosts=123.30.39.196[23061],123.30.39.153[23061];po
rt_range=1;timeout=5000;num_initial_members=2;up_thread=true;down_thread
=true):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=50
00;max_tries=48;):VERIFY_SUSPECT(timeout=1500;down_thread=false;up_threa
d=false):pbcast.NAKACK(max_xmit_size=60000;gc_lag=50;retransmit_timeout=
100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABL
E(stability_delay=1000;desired_avg_gossip=20000;down_thread=false;up_thr
ead=false;max_bytes=0):VIEW_SYNC(avg_send_interval=60000;down_thread=fal
se;up_thread=false):pbcast.GMS(print_local_addr=true;join_timeout=5000;j
oin_retry_timeout=2000;shun=false;up_thread=true;down_thread=true)
.
Update your customer_overrides.properties file on Node 2 with the
following new info:
.
###Node2 -- customer_overrides.properties
jgroups_cluster.property_string=TCP(bind_addr=123.30.39.153;start_port=2
3061):TCPPING(initial_hosts=123.30.39.153[23061],123.30.39.196[23061];po
rt_range=1;timeout=5000;num_initial_members=2;up_thread=true;down_thread
=true):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=50
00;max_tries=48;):VERIFY_SUSPECT(timeout=1500;down_thread=false;up_threa
d=false):pbcast.NAKACK(max_xmit_size=60000;gc_lag=50;retransmit_timeout=
100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABL
E(stability_delay=1000;desired_avg_gossip=20000;down_thread=false;up_thr
ead=false;max_bytes=0):VIEW_SYNC(avg_send_interval=60000;down_thread=fal
se;up_thread=false):pbcast.GMS(print_local_addr=true;join_timeout=5000;j
oin_retry_timeout=2000;shun=false;up_thread=true;down_thread=true)
.
Next
1. shutdown both nodes

2. On node1 issue ./run.sh restart

3. On node2 issue ./run.sh

4. Rerun upload file test to MyFilegateway on Node 1 and the file was successfully added and routed to the consumer. SI BP MailboxEvaluateAllAutomaticRulesSubMin ran successfully which caused the routing to occur.

Rate this page:

(0 users)Average rating

Document information


More support for:

Sterling File Gateway

Software version:

2.2

Operating system(s):

AIX

Reference #:

1632209

Modified date:

2013-03-27

Translate my page

Machine Translation

Content navigation