How to diagnose and troubleshoot a Cast Iron Secure Connector issue
A secure connector instance sometimes may be shown as "Stopped" on Cast Iron Live Web Management Console. The on-going jobs may finish with errors, and a new job won't start or orchestration that uses the specific Secure Connector is unable to deploy.
This may be due to an on-premise environment reason such as network being inaccessible, firewall block, inbound port being used by other processes or other factors.
Diagnosing the problem
The following steps help determine if the Secure Connector service is still actually running on the on-premise machine and if so, then we know service is listening and that something else is preventing the connection.
1. When you see that in the Live WMC the Secure Connector stopped and it has been more than 3-5 minutes since status has last changed (make sure to refresh the page from the bottom), then go to the box where that Secure Connector is supposedly running.
2. The first thing to check is to verify if the service is still running. To do so, go to the Secure Connector folder (C:\Program Files\IBM\Secure_Connector_1.6.x.x.x) and open the file agent.pid. There should be a number there and this is the process id of the Secure Connector. Now open the WindowsTask Manager (go to Windows Start -> Run "taskmgr"). Now make sure to display the column for the process id and sort the list using this column. Ideally, if the process is running the process id you got from the agent.pid file should be there and normally the name of the process is Java.
3. Now if the process id shows up in the Taks Manager, we know the service is running. We need to know if the service is still listening for communication from the cloud. To check this, open the browser on that machine and put the following url:
(*port 1080 may be varied depends on your SC settings "Listening on Port", which could be 2500 in some cases)
If the service is up-running and listening, the results you should get is a WSDL file which is basically xml data. If you get some HTTP code error, then we know the service isn't listening.
4. Check the result of prompt line "netstat -ano" and verify if the client port is being LISTENING and ESTABLISHED with the SC pid. If not, check which process of this pid is occupying the port. Terminate the process and restart the SC.
5. If the above check turns out to be fine, then we know the service is up and listening. If the cloud management console still shows that it is still stopped (remember to refresh the Live WMC), run a "ping" command and "tracert" command from the Secure Connector windows box to the Live gateway(
gateway.castiron.com) and see if it is a networking issue. In normal cases, the result of above two commands should at least show the IP address after the hostname(
gateway.castiron.com) is resolved, even though there may not be a response from the IP address. "tracert" command result further shows whether the Secure Connector instance can reach external address.
6. Look at the Secure Connector logs (messages.log, agent.log and debug.zip) to see what the actual error it is generating. Ideally the log should be generated at debugging level, which can be configured by the following steps:
a) Go to Secure Connector installation directory.
b) Open file “localConfig_ws.xml” and change the value “<debug>false</debug>” to “<debug>true</debug>”.
c) Restart the Secure connector.
d) Collect the "debug.zip" file under the "/Secure Connector installation directory"
e) Collect the files under the log directory of the Secure Connector installation folder
7. Collect the failed job id on Live WMC that may be related to the SC issue.
8. Collect other information at the issue occurrence such as concurrent job numbers, long running job time...etc.
Send the result of each steps and the data collected to IBM Support for further analysis if the issue still persists or occurs intermittently.