IBM Support

PM22587: WEBSPHERE CONTROLLER HANG

Fixes are available

6.1.0.35: Java SDK 1.5 SR12 FP2 Cumulative Fix for WebSphere
7.0.0.15: WebSphere Application Server V7.0 Fix Pack 15 for AIX
7.0.0.15: Java SDK 1.6 SR9 Cumulative Fix for WebSphere Application Server
7.0.0.15: WebSphere Application Server V7.0 Fix Pack 15 for HP-UX
7.0.0.15: WebSphere Application Server V7.0 Fix Pack 15 for IBM i
7.0.0.15: WebSphere Application Server V7.0 Fix Pack 15 for Linux
7.0.0.15: WebSphere Application Server V7.0 Fix Pack 15 for Solaris
7.0.0.15: WebSphere Application Server V7.0 Fix Pack 15 for Windows
6.1.0.37: Java SDK 1.5 SR12 FP3 Cumulative Fix for WebSphere
7.0.0.17: WebSphere Application Server V7.0 Fix Pack 17
7.0.0.17: Java SDK 1.6 SR9 FP1 Cumulative Fix for WebSphere Application Server
7.0.0.19: WebSphere Application Server V7.0 Fix Pack 19
7.0.0.21: WebSphere Application Server V7.0 Fix Pack 21
7.0.0.23: WebSphere Application Server V7.0 Fix Pack 23
7.0.0.25: WebSphere Application Server V7.0 Fix Pack 25
7.0.0.27: WebSphere Application Server V7.0 Fix Pack 27
7.0.0.29: WebSphere Application Server V7.0 Fix Pack 29
6.1.0.47: WebSphere Application Server V6.1 Fix Pack 47
7.0.0.31: WebSphere Application Server V7.0 Fix Pack 31
7.0.0.27: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.33: WebSphere Application Server V7.0 Fix Pack 33
7.0.0.35: WebSphere Application Server V7.0 Fix Pack 35
7.0.0.37: WebSphere Application Server V7.0 Fix Pack 37
7.0.0.39: WebSphere Application Server V7.0 Fix Pack 39
7.0.0.41: WebSphere Application Server V7.0 Fix Pack 41
7.0.0.43: WebSphere Application Server V7.0 Fix Pack 43
7.0.0.45: WebSphere Application Server V7.0 Fix Pack 45
6.1.0.39: Java SDK 1.5 SR12 FP4 Cumulative Fix for WebSphere Application Server
6.1.0.41: Java SDK 1.5 SR12 FP5 Cumulative Fix for WebSphere Application Server
6.1.0.43: Java SDK 1.5 SR13 Cumulative Fix for WebSphere Application Server
6.1.0.45: Java SDK 1.5 SR14 Cumulative Fix for WebSphere Application Server
6.1.0.47: Java SDK 1.5 SR16 Cumulative Fix for WebSphere Application Server
7.0.0.19: Java SDK 1.6 SR9 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.21: Java SDK 1.6 SR9 FP2 Cumulative Fix for WebSphere
7.0.0.23: Java SDK 1.6 SR10 FP1 Cumulative Fix for WebSphere
7.0.0.25: Java SDK 1.6 SR11 Cumulative Fix for WebSphere Application Server
7.0.0.27: Java SDK 1.6 SR12 Cumulative Fix for WebSphere Application Server
7.0.0.29: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.45: Java SDK 1.6 SR16 FP60 Cumulative Fix for WebSphere Application Server
7.0.0.31: Java SDK 1.6 SR15 Cumulative Fix for WebSphere Application Server
7.0.0.35: Java SDK 1.6 SR16 FP1 Cumulative Fix for WebSphere Application Server
7.0.0.37: Java SDK 1.6 SR16 FP3 Cumulative Fix for WebSphere Application Server
7.0.0.39: Java SDK 1.6 SR16 FP7 Cumulative Fix for WebSphere Application Server
7.0.0.41: Java SDK 1.6 SR16 FP20 Cumulative Fix for WebSphere Application Server
7.0.0.43: Java SDK 1.6 SR16 FP41 Cumulative Fix for WebSphere Application Server
Obtain the fix for this APAR.

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • A WebSphere server becomes unresponsive.
    
    Review of the servant regions show 26 threads stuck in
    bboosout(BBOOSOUT_Functions,ORB_Request_SharedMemberData*,lo
                +00000000              BBGBOA
    ORB_Request::comm_outbound_request(unsigned int)
                +00000ADE              BBGBOA
    
    and threads stuck in
    bboosout(BBOOSOUT_Functions,ORB_Request_SharedMemberData*,lo
                +00000000              BBGBOA
    ORB_Request::comm_outbound_locate(unsigned int)
                +000015B8              BBGBOA
    
    Review of controller region shows 1 ACRW thread in
    
    Entry      Offset    Function
    -----      ------    --------
    30a00f38   00000000  bbocfasy(long long,int*,int*,int*)
    309b6d98   00000708  CF_TCP_Connection::send_data(bool,bool)
    309b6128   00000b02
    CF_TCP_Connection::async_send_msg(msghdr*,int,bool,bool,int)
    309fa980   000001be
    CF_TCP_Request::write(void*,msghdr*,int,int,bool,bool)
    309e5728   000007fa
    Java_com_ibm_ws390_tcp_channel_ZAioTCPChannelCPPUtilities_write
    7bb2d1b0   844d2e50   RUNCALLINMETHOD
    7bb63488   0000003e   gpProtectedRunCallInMethod
    7bb60fe0   0000001c   signalProtectAndRunGlue
    7b92eb08   00000356   j9sig_protect
    7bb659c0   000000a4   gpCheckCallin
    7bb5ea20   00000072   callStaticObjectMethodA
    30211de8   000012de
    ZIOPChannelBridge::send_outbound_request(ORB_Request*,void*,int,
    volatile int*,int)
    
    The other 24 ACRW threads are stuck behind the one above with
    this call stack:
    
    Entry      Offset     Function
    -----      ------     --------
    2f7aed78   000020c4   CEEOPCT
    2fa90c68   000000c0   pthread_cond_timedwait
    3215f728   00000412   monitor_wait
    32162398   00000014   j9thread_monitor_wait_timed
    7bba2660   00000010   callMonitorWaitTimed
    7bb3c4e0   0000030a   Z_OBJECTMONITORENTERBLOCKING
    7bb91b10   00000014   objectMonitorEnterBlocking
    7bb2d1b0   ffc4ca14   RUNCALLINMETHOD
    7bb63488   0000003e   gpProtectedRunCallInMethod
    7bb60fe0   0000001c   signalProtectAndRunGlue
    7b92eb08   00000356   j9sig_protect
    7bb659c0   000000a4   gpCheckCallin
    7bb5ea20   00000072   callStaticObjectMethodA
    30211de8   000012de
    ZIOPChannelBridge::send_outbound_request(ORB_Request*,void*,int,
    volatile int*,int)
    
    Javacore of this same controller does indicate that the first
    TCB holds a flat lock while sending data on the connection.
    The send (bbocfasy) has blocked.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server V6.1.0                               *
    ****************************************************************
    * PROBLEM DESCRIPTION: WebSphere Application Server for z/OS   *
    *                      Controller becomes unresponsive while   *
    *                      sending a large number of IIOP          *
    *                      requests to a single remote server.     *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Outbound IIOP requests from a WebSphere Application Server for
    z/OS server are handled by a single thread pool.  If a large
    number of IIOP requests are being sent to a single remote
    server, it is possible for all of the threads in this pool to
    become occupied sending data, and due to synchronization
    issues only one thread at a time is able to send its data.  If
    there is an external delay sending data (such as TCP/IP not
    responding for a long time), the entire thread pool can become
    stuck waiting to send data, effectively freezing the
    Controller.
    
    This problem can be spotted by looking at thread stacks in a
    dump.  A single thread will have a stack with something
    similar to this at the top:
    
    at com/ibm/ws390/tcp/channel/ZAioTCPChannelCPPUtilities.write
    at com/ibm/ws/tcp/channel/impl/ZAioTCPWriteRequestContextImpl
            .writeCommon
    at com/ibm/ws/tcp/channel/impl/ZAioTCPWriteRequestContextImpl
            .write
    at com/ibm/ws/ssl/channel/impl/SSLWriteServiceContext.write
    at com/ibm/ws390/channel/ziop/ZIOPOutboundMessageHandler
            .writeMessage
    
    Numerous other thread will have a stack with something similar
    to this at the top:
    
    at com/ibm/ws390/channel/ziop/ZIOPOutboundMessageHandler
            .writeMessage
    at com/ibm/ws390/channel/ziop/ZIOPConnectionContext
            .writeMessage
    

Problem conclusion

  • Code has been changed to queue the sending of outbound requests
    so that a single thread can handle the sending of requests and
    additional threads can add data to the queue instead of waiting
    for previous sends to complete.
    
    APAR PM22587 requires changes to documentation.
    
    NOTE: Periodically, we refresh the documentation on our
    Web site, so the changes might have been made before you
    read this text. To access the latest on-line
    documentation, go to the product library page at:
    
    http://www.ibm.com/software/webservers/appserv/library
    
    The following change to the WebSphere Application Server
    Version 6.1 Information Center will be made available in
    December, 2010.
    
    The following description of the
    iiop_max_send_queue_megsize environment variable will be
    added to the "Application server custom properties that
    are unique for the z/OS platform" topic:
    
    iiop_max_send_queue_megsize
    
    Specifies, in megabytes, the maximum amount of data that
    can be queued up to send asynchronously over a single IIOP
    connection. If the amount of data queued exceeds the
    specified value, future IIOP requests over this connection
    fail with a C9C26A4D minor code. The minimum value for this
    property is 0, which indicates that there is no limit to
    the amount of data that can be queued for sending. The
    maximum value is 2048.
    
    Data Type       Integer
    Default         0
    Used by Daemon  No
    
    APAR PM22587 is currently targeted for inclusion in
    Service Level (Fix Pack) 6.1.0.35 of WebSphere
    Application Server V6.1.
    
    Please refer to URL:
    //www.ibm.com/support/docview.wss?rs=404&uid=swg27006970
    for Fix Pack availability.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM22587

  • Reported component name

    WEBSPHERE FOR Z

  • Reported component ID

    5655I3500

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-09-15

  • Closed date

    2010-10-11

  • Last modified date

    2011-01-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PM22988

Fix information

  • Fixed component name

    WEBSPHERE FOR Z

  • Fixed component ID

    5655I3500

Applicable component levels

  • R610 PSY UK62676

       UP10/12/17 P F012

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.



Document information

More support for: WebSphere Application Server for z/OS
General

Software version: 6.1

Reference #: PM22587

Modified date: 04 January 2011