APAR status
Closed as program error.
Error description
This APAR addresses two problems that arise when more then one IMS Connect on the same LPAR attempt to listen on the same ports but the ports are *NOT* defined to TCP/IP as SHAREPORTs. * Problem One: When more then one IMS Connect on the same LPAR attempt to listen on the same ports but the ports are *NOT* defined to TCP/IP as SHAREPORTs, here's what happens: The first IMS Connect to come up is successful in listening on the ports and it puts out the following message: HWSS0790I LISTENING ON PORT=xxxx STARTED; M=SOC2. When the rest of the IMS Connects are started they *too* will put out the same HWSS0790I message implying that they too are listening on the same ports but they are not. Infact, VIEWHWS for the other IMS Connect shows: HWSC0001I PORT=xxxx STATUS=NOT ACTIVE NETSTAT CONN output also confirms that the other IMS Connects are not listening on those ports. So why are we putting out the HWSS0790I LISTENING ON PORT ... message? Analysis reveals The Open Port process begins with module HWSSOC20. This module basically creates a seperate thread to do the Open Ports and then issues the HWSS0790I LISTENING ON ... messages. It has no idea if the Open Port process succeded or not. So the HWSS0790I message is premature. The sepearte Thread that is created has the following module flow: HWSSSCH0 -> (via a driver call) HWSSDOTD. It is HWSSDOTD that actually gets a socket, Binds it to the Port, and then does a Listen call. A review of the HWSTRACEs showed that in the case of the other IMS Connects the BIND call fails with RC=x'45B' & RSN=x'744C7247'. The RC=x'45B' or decimal 1115 is EADDRINUSE I.E. The address is already in use. This is the error message that one would get when an application attempts to BIND to a PORT that some other application is already using and the Port in NOT defined to TCP/IP as SHAREPORT. So we shouldn't be issuing the HWSS0790I message from HWSSOC20. We should issue the message either from HWSSSCH0 or HWSSDOTD after the open port process is successful & if it is not successful then issue an appropriate message. * Problem Two: After the customer brought up more then one IMS Connect (on the same LPAR) attempting to listen on the same ports but the ports were *not* defined to TCP/IP as SHAREPORTs, the customer issued a STOPPORT command on the first IMS CONNECT. The command is successful on the first IMS Connect & the following messages are issued: HWSP1415E TCP/IP SOCKET FUNCTION CALL FAILED; F=ACCEPT4 , R=-1, E=1152, M=SDCO HWSS0770I LISTENING ON PORT=xxxx TERMINATED; M=SSCH But on the other IMS Connect we immediately see the following messages: HWSP1415E TCP/IP SOCKET FUNCTION CALL FAILED; F=LISTEN , R=-1, E=1115, M=SDOT HWSS0771W LISTENING ON PORT=3051 FAILED; R=-1, S=SOCKFAIL, M=SSCH These messages are misleading. It appears as if we issue the STOPPORT on one IMS Connect & the other IMS Connect also gets affected. Furthermore, the second IMS Connect was not listening on the port to begin with. So why are we getting this message on the second IMS Connect?? * Analysis reveals that after the Bind or Listen call *FAILS* (for the second IMS Connect with RC=x'45B'), the code closes the socket, waits for 1 second and then tries the whole proces all over again i.e. get a socket, BIND call & then the Listen call. And we keep doing this until the open port process completes or upto 2000 times (whichever comes first). Clearly we shouldn't be doing this loop if we have gotten a RC=x'45B' because if the Port is in use by another application we will keep getting RC=x'45B' (and thats exactly what we see in the traces). In any case, in the second IMS Connect, we have this loop going. When the STOPPORT command is issued on the first IMS Connect, the Port now becomes available and in the second IMS Connect our loop becomes successful i.e. the get socket call, BIND & LISTEN calls are successful. *BUT* there is a bug in HWSSDOTD. At label USS_END we set R15 to the value of RETCODE which was set when the BIND call failed (in previous loops). RETCODE is not set (or reset) when the BIND & LISTEN calls are successful. So we return with R15 which has a bad return code. Hence HWSSDOTD issues HWSP1415E TCP/IP SOCKET FUNCTION CALL FAILED message. Then we return to HWSSSCH0 & it issues the HWSS0771W LISTENING ... FAILED message.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All IMS V910 IMS Connect users. * **************************************************************** * PROBLEM DESCRIPTION: HWSS0790I LISTENING ON PORT=xxxx * * STARTED message is issued prior to IMS * * Connect actually establishing * * connection to the port. If the BIND is * * not successful the message is * * misleading. Furthermore, notification * * of the port BIND failure via message * * HWSS0771W is not sent until 33 minutes * * later. * * * * Secondly if the port BIND is successful * * after a subsequent attempt via internal * * retry logic, the unsuccessful HWSS0771W * * message is returned. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** IMS Connect issues message 'HWSS0790I LISTENING ON PORT XXXX STARTED ...' prior to attempting to bind to the port. The message can be misleading if the bind fails. Notification of the bind failure via message HWSS0771W will not be provided until 33 minutes later after exhausting 2000 retries with 1 second delays between each retry attempt. Additionally, if a subsequent port BIND attempt via the retry logic completes successfully an internal logic error returns a failing return code resulting in the HWSS0771W error message. Note: The reported failure resulted from an unsuccessful bind attempt. However, the failing scenario is also true when SETSOCKOPT or LISTEN service call fail. Keyword(s): MSGHWSS0790I MSGHWSS0771W
Problem conclusion
AIDS: RIDS/SYS RIDS/CNTRL SYS CNTRL DEP: NONE GEN: *** END IMS KEYWORDS *** Message HWSS0790I has been removed from part HWSSOC20. The message has been relocated to part HWSSDOTD in logic that can determine the state of the port before the message is issued. Logic has been corrected in part HWSSDOTD. HWSS0790I message is not issued until after the BIND to the port is successful. If the BIND is not successful internal loop logic will retry up to 300 times (reduced from 2000). If unsuccessful after 300 attempts messages HWSP1415E and HWSS0771W are issued. A logic error that results in an residual non-zero return code to be detected after a successful BIND during a retry has been corrected. This maintenance also addresses the same issue in case of LISTEN or SETSOCKOPT failures. An update the IMS Connect Guide and Reference (SC18-9287-XX) is made to indicate that the HWSS0790 is issued from part HWSSDOTD (e.g. .... M=SDOTD). Prior to the change the message is issued from part HWSSOC20 (e.g. M=SOC2) for TCP/IP ports. Chapter 16 IMS Connect Error Codes and Message: Message HWSS0790I changed the M=module code from SOC2 to SDOT old - HWSS0790I LISTENING ON PORT=portid STARTED; M=SOC2 new - HWSS0790I LISTENING ON PORT=portid STARTED; M=SDOT changed the module from 'SOC2 - HWSSOC20' to 'SDOT - HWSSDOTD Module: * SDOT - HWSSDOTD - An update the IMS Connect Guide and Reference (SC18-9287-XX) to the 'System Programmer Response: for HWSP1415I and HWSP1415E informational and error messages. The document referenced to determine the cause of the specified return code and error code ('TCP/IP Application Programming Interface Reference') applies to TCP/IP macro API calls only not for Unix System Services (USS) callable interface return code and reason codes. Since IMS Connect 9.1 requires TCP/IP at z/OS 1.4 or higher the referenced in the System Programmer response should refer to z/OS Unix System Services Messages and codes manual. System Programmer Response: <--BEFORE CHANGE For the possible cause of the specified return code and error code, see TCP/IP Application Programming Interface Reference. System Programmer Response: <--AFTER CHANGE For the possible cause of the specified return code and error code, see z/OS Unix System Services Message and Codes. However, for TCP/IP Macro API, see TCP/IP Application Programming Interface Reference.
Temporary fix
Comments
APAR Information
APAR number
PK15313
Reported component name
IMS V9
Reported component ID
5655J3800
Reported release
900
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2005-11-16
Closed date
2006-05-26
Last modified date
2006-10-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK14912
Modules/Macros
HWSSDOTD HWSSOC20
SC18928701 |
Fix information
Fixed component name
IMS V9
Fixed component ID
5655J3800
Applicable component levels
R900 PSY UK14912
UP06/06/09 P F606
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
04 October 2006