IBM Support

Rational ClearCase operations hang with the error message of "timed out trying to communicate with ClearCase remote server. "

Troubleshooting


Problem

This technote describes some symptoms that will occur when the workload on an IBM Rational ClearCase UNIX server exceeds the capacity of the server, and some measures that can be taken to relieve them.

Symptom

Intermittently, and particularly during peak usage periods of the day, users of both ClearCase Remote Client (CCRC) and full ClearCase client (ClearCase Explorer, for example) experience "hangs" which usually will eventually end with a pop-up message "timed out trying to communicate with ClearCase remote server".

Checking the ClearCase logs on the server shows messages like the following examples:

>cleartool getlog -around now 10 albd db vobrpc ccfs


=============================================================================
Log Name: albd                  Hostname: vobhost      Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:18:04+08 albd_server(1319118): Error: Server vobrpc_server (pid=2809890) on "/vob_store/VOBs/aaa.vbs" died on startup; marking it as "down".
=============================================================================
Log Name: db                    Hostname: vobhost      Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:11:35+08 db_server(880886): Error: albd_rgy_findbyuuid_entry call failed: RPC: Timed out
2011-10-09T10:11:35+08 db_server(880886): Error: Trouble contacting registry on host "vobhost": timed out trying to communicate with ClearCase remote server.
2011-10-09T10:11:35+08 db_server(880886): Error: Error searching for replica e49551ac.d21b11dc.a041.00:02:c3:0d:60:4c in registry: error detected by ClearCase subsystem
2011-10-09T10:11:46+08 db_server(1642540): Error: albd_server_idle call failed: RPC: Timed out
2011-10-09T10:11:46+08 db_server(1642540): Error: Error sending idle message to albd server: timed out trying to communicate with ClearCase remote server
=============================================================================
Log Name: vobrpc                Hostname: vobhost      Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:13:09+08 vobrpc_server(1868278): Error: albd_sched_info call failed: RPC: Timed out
2011-10-09T10:14:19+08 vobrpc_server(2650278): Error: albd_sched_info call failed: RPC: Timed out
2011-10-09T10:14:41+08 vobrpc_server(1167414): Error: albd_server_busy call failed: RPC: Timed out
2011-10-09T10:15:17+08 vobrpc_server(3293642): Error: albd_sched_info call failed: RPC: Timed out
2011-10-09T10:15:26+08 vobrpc_server(1167414): Error: Unable to contact albd_server on host 'vobhost'
2011-10-09T10:15:26+08 vobrpc_server(1167414): Error: Operation "rgy_findbyuuid_entry" failed: timed out trying to communicate with ClearCase remote server.
2011-10-09T10:15:26+08 vobrpc_server(1167414): Error: Unable to get VOB object registry information for replica uuid "9d2d6700.862011dd.a055.00:02:c3:0d:60:4c" (vobhost:/vob_store/VOBs/aaa.vbs): error detected by ClearCase subsystem
=============================================================================
Log Name: ccfs                  Hostname: vobhost      Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:21:16+08 albd_server(1319118): Error: ccfs_server(1236996): Error: Unable to contact albd_server on host 'vobhost'
2011-10-09T10:21:16+08 albd_server(1319118): Error: ccfs_server(1236996): Error: Operation "rgy_findbyuuid_entry" failed: timed out trying to communicate with ClearCase remote server.
2011-10-09T10:21:16+08 albd_server(1319118): Error: ccfs_server(1236996): Error: Unable to get VOB tag registry information for replica uuid "faab7d68.bae411df.8043.00:02:c3:0d:60:4c": timed out trying to communicate with ClearCase remote server

Cause

The problem can caused by UDP buffer overrun as a result of too many ClearCase Roles for one machine and or a large inundation of UDP packets simultaneously or in a small time interval wherein the machine cannot handle such a load.

In general, ClearCase scales best horizontally across multiple machines instead of vertically on a machine with massive resources. UDP packet communication of the albd server (registry server), VOB server, VIEW server, and credmap server. If the machine itself is not tuned appropriately or does not have enough resources to accept the scale of UDP packets that are delivered to it in enough time for it to be processed in the machine's UDP receive buffer, the UDP packet will be dropped.

Environment

VOB server, View Server, Registry server, License Server are configured in a single server.

Resolving The Problem

To relieve the issue, you can execute any of the following options:

Option 1: Increase UDP buffer on the receiving problem server machine

Solaris 9 example:


ndd -set /dev/udp udp_max_buf 8388608

ndd -set /dev/udp udp_xmit_hiwat 65535 

ndd -set /dev/udp udp_recv_hiwat 65535

Solaris 10 example:

ndd -set /dev/udp udp_max_buf 8388608

ndd -set /dev/udp udp_xmit_hiwat 8388608

ndd -set /dev/udp udp_recv_hiwat 8388608

Solaris 11 example:

# ipadm set-prop -p max_buf=8388608 udp


# ipadm set-prop -p send_buf=8388608 udp
# ipadm set-prop -p recv_buf=8388608 udp

AIX example:


no -p -o udp_recvspace=655360

no -p -o sb_max=1310720

Reference: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/udp_recvspace.htm

Option 2: Split ClearCase roles across multiple machines

  • Use a separate dedicated, licence and or registry server
  • Use a separate dedicated, VOB and or View server

[{"Product":{"code":"SSSH27","label":"Rational ClearCase"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Application Conflicts","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"9.0;8.0;7.1.2;7.1.1;7.1;7.0.1;7.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}},{"Product":{"code":"SSSH27","label":"Rational ClearCase"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Operating System Configurations","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"7.0;7.0.1;7.1;7.1.1;7.1.2","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
16 June 2018

UID

swg21577649