IBM Support

IT38745: Highly available Managed File Transfer agent uses 100% CPU while trying to reconnect to its agent queue manager.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An instance of an MQ Managed File Transfer highly available (MFT
    HA) agent is running on Linux.
    
    While the agent instance is running, it becomes disconnected
    from its agent queue manager. When this happens, the CPU usage
    of the agent instance process remains at 100% until the instance
    reconnects.
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects all users of MQ Managed File Transfer highly
    available (MFT HA) agents.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    A highly available Managed File Transfer agent consists of:
    
    - One active instance.
    - One or more standby instances.
    
    The first instance of the agent that starts up locks a shared
    resource (the SYSTEM.FTE.HA.agent_name queue on the agent queue
    manager). When the other instances start, they fail to obtain
    the lock and become a standby instance. The standby instances
    will then attempt to take the lock at regular intervals, as
    specified by the agent property standbyPollInterval - once a
    standby instance obtains the lock, then it becomes the active
    instance.
    
    After the active instance has locked the shared resource, it
    performs its normal startup operations and then starts
    processing managed transfers.
    
    Now, if the agent queue manager was stopped while the highly
    available agent was running, then the following sequence of
    events would occur:
    
    - The agent instances became disconnected and immediately tried
    to reconnect.
    - These reconnection attempts failed, because the queue manager
    was not running.
    - The agent instances immediately tried to reconnect again.
    - Once again, the reconnection attempts failed because the queue
    manager was unavailable, and so the instances tried to reconnect
    again straight away.
    - These reconnection attempts also failed.
    - The instances immediately tried to reconnect for a third time.
    
    And so on. Because the instances were trying to reconnect to the
    agent queue manager in a tight loop, they ended up consuming a
    lot of CPU.
    

Problem conclusion

  • To resolve this issue, MQ Managed File Transfer highly available
    agents have been updated to wait for the period of time
    specified by the agent property:
    
    standbyPollInterval
    
    before trying to reconnect to the agent queue manager (the
    default value of this property is 5 seconds, which means that
    agents will wait 5 seconds in between reconnection attempts).
    This ensures that the agent does not perform reconnection
    attempts in a tight loop, and so reduces the CPU usage of the
    agent instance processes.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.5
    v9.x CD    9.2.5
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT38745

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-10-20

  • Closed date

    2021-11-24

  • Last modified date

    2021-11-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
25 November 2021