IBM Support

IT37869: FDC containing a NullPointerException occurs if a managed transfer enters recovery during negotiation and times out

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An MQ 9.2 Managed File Transfer agent is acting as the source
    agent for managed transfers, where the managed transfers have a
    transfer recovery timeout of 30 seconds.
    
    Intermittently, the managed transfers processed by the agent go
    into recovery during the negotiation phase because the
    destination agent is not running. This causes the source agent
    to write the following messages to its event log (output0.log):
    
      BFGTR0068I: Transfer ID: <transfer identifier>, to destination
    agent <destination agent>, is entering recovery, immediately,
    due to recoverable error: BFGTR0064I: The transfer sender timed
    out waiting for a response from the transfer receiver.
    
      BFGTR0081I: Transfer ID: '<transfer identifier>' is now in
    recovering state. The agent will wait for 30 seconds for the
    transfer to recover.
    
    30 seconds later, the destination agent still isn't running, and
    so the managed transfer times out. When this happens, the source
    agent writes the message:
    
      BFGSS0081E: Recovery of transfer ID: '<transfer_identifier>'
    timed out after 30 seconds. The managed transfer has been
    terminated.
    
    to its event log, and moves the managed transfer into a
    RecoveryTimedOut state.
    
    
    Shortly after the destination agent restarts, the source agent
    generates an FDC containing the information shown below:
    
    ...
    Thread:     38 (TransferCompletedWorker)
    Class:
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread
    Method:     processCommand
    Probe:      FFDC_001
    Cause:
    com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException:
    BFGCH0066E: An internal error has occurred. An exception has
    been reported by the State Store. The exception is
    java.lang.NullPointerException
    com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException:
    BFGCH0066E: An internal error has occurred. An exception has
    been reported by the State Store. The exception is
    java.lang.NullPointerException
    	at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p
    rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...)
    	at
    com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.internalMessag
    eHandler(MessageHandlerImpl.java:...)
    	at
    com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.processCommand
    (MessageHandlerImpl.java:...)
    	at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.pro
    cessCommand(FTECommandHandlerWorkerThread.java:...)
    	at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.run
    (FTECommandHandlerWorkerThread.java:...)
    	at java.lang.Thread.run(Thread.java:...)
    	at com.ibm.wmqfte.thread.FTEThread.run(FTEThread.java:...)
    Caused by: java.lang.NullPointerException
    	at
    com.ibm.wmqfte.transfer.impl.TransferSenderRunnable.auditComplet
    e(TransferSenderRunnable.java:...)
    	at
    com.ibm.wmqfte.transfer.impl.TransferSenderImpl.onComplete(Trans
    ferSenderImpl.java:...)
    	at
    com.ibm.wmqfte.statestore.impl.FTEStateStoreImpl.sendForgetToFar
    End(FTEStateStoreImpl.java:...)
    	at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p
    rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...)
    	... 6 more
    
    and the managed transfer gets stuck in the RecoveryTimedOut
    state.
    

Local fix

  • n/a
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects users of MQ 9.2 Managed File Transfer, who
    have enabled the transfer recovery timeout functionality for
    managed transfers.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    The MQ Managed File Transfer recovery timeout functionality
    provides a mechanism to automatically fail a managed transfer if
    it enters recovery, and stays in recovery for longer than the
    specified timeout period. Some information on this can be found
    in the topic:
    
     Transfer recovery timeout concepts
    
    in the MQ 9.2 section of IBM Documentation. The URI for the
    topic is:
    
    https://www.ibm.com/docs/en/ibm-mq/9.2?topic=transfers-transfer-
    recovery-timeout-concepts
    
    
    Now, if an agent was acting as the source agent for a managed
    transfer that had a transfer recovery timeout specified, and the
    managed transfer entered recovery during the transfer
    negotiation phase because the destination agent did not respond,
    then the source agent would write the following messages to its
    event log (output0.log):
    
      BFGTR0068I: Transfer ID: <transfer identifier>, to destination
    agent <destination agent>, is entering recovery, immediately,
    due to recoverable error: BFGTR0064I: The transfer sender timed
    out waiting for a response from the transfer receiver.
    
      BFGTR0081I: Transfer ID: '<transfer identifier>' is now in
    recovering state. The agent will wait for <timeout> seconds for
    the transfer to recover.
    
    If the managed transfer subsequently timed out due to the
    transfer recovery timeout, the source agent moved the managed
    transfer into a "RecoveryTimedOut" state, and wrote the message:
    
      BFGSS0081E: Recovery of transfer ID: '<transfer_identifier>'
    timed out after <timeout> seconds. The managed transfer has been
    terminated.
    
    to its event log.
    
    When the destination agent eventually responded to the transfer
    negotiation message, the source agent started to perform the
    required cleanup for the managed transfer before cancelling it.
    As part of this processing, it tried to generate some internal
    audit information for the managed transfer based on the items
    that had been transferred. However, because the managed transfer
    hadn't actually started, the attempt to do this failed and the
    source agent generated an FDC containing the following
    information:
    
    ...
    Thread:     38 (TransferCompletedWorker)
    Class:
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread
    Method:     processCommand
    Probe:      FFDC_001
    Cause:
    com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException:
    BFGCH0066E: An internal error has occurred. An exception has
    been reported by the State Store. The exception is
    java.lang.NullPointerException
    com.ibm.wmqfte.cmdhandler.FTECommandHandlerErrorException:
    BFGCH0066E: An internal error has occurred. An exception has
    been reported by the State Store. The exception is
    java.lang.NullPointerException
    &#09;at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p
    rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...)
    &#09;at
    com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.internalMessag
    eHandler(MessageHandlerImpl.java:...)
    &#09;at
    com.ibm.wmqfte.cmdhandler.impl.MessageHandlerImpl.processCommand
    (MessageHandlerImpl.java:...)
    &#09;at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.pro
    cessCommand(FTECommandHandlerWorkerThread.java:...)
    &#09;at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerWorkerThread.run
    (FTECommandHandlerWorkerThread.java:...)
    &#09;at java.lang.Thread.run(Thread.java:...)
    &#09;at com.ibm.wmqfte.thread.FTEThread.run(FTEThread.java:...)
    Caused by: java.lang.NullPointerException
    &#09;at
    com.ibm.wmqfte.transfer.impl.TransferSenderRunnable.auditComplet
    e(TransferSenderRunnable.java:...)
    &#09;at
    com.ibm.wmqfte.transfer.impl.TransferSenderImpl.onComplete(Trans
    ferSenderImpl.java:...)
    &#09;at
    com.ibm.wmqfte.statestore.impl.FTEStateStoreImpl.sendForgetToFar
    End(FTEStateStoreImpl.java:...)
    &#09;at
    com.ibm.wmqfte.cmdhandler.impl.FTECommandHandlerStateStoreImpl.p
    rocessInternalCommand(FTECommandHandlerStateStoreImpl.java:...)
    &#09;... 6 more
    
    This caused the managed transfer to remain registered in the
    agent's internal state store, where it occupied a slot in the
    "source transfer list". This meant that the number of source
    transfers that the agent could participate in, as specified by
    the agent property:
    
      maxSourceTransfers
    
    was reduced by 1.
    
    If the issue occurred multiple times during the lifetime of an
    agent, it was possible for all of the source transfer slots to
    be occupied by managed transfers that had encountered the issue.
    This would result in the agent being unable to act as the source
    agent for any new managed transfers.
    

Problem conclusion

  • To resolve this issue, two changes have been made:
    
    Change 1:
    -----------
    The first change modifies the logic used to handle managed
    transfers that have timed out due to a transfer recovery
    timeout, so that the source agent only collects audit
    information for the items in the managed transfer if the managed
    transfer actually started before it entered recovery and timed
    out. This prevents the FDC containing the NullPointerException
    reported in the APAR from occurring.
    
    Change 2:
    -----------
    In addition to this, the design of the transfer recovery timeout
    functionality has been changed, so that managed transfers which
    have entered a RecoveryTimedOut state are moved into an
    "inactive transfers list". This allows the source agent to
    participate in other managed transfer requests while it is
    waiting to perform the processing required to clean up state for
    managed transfers that have entered recovery and timed out.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.5
    v9.x CD    9.2.5
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT37869

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-08-03

  • Closed date

    2021-11-29

  • Last modified date

    2021-11-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
30 November 2021