IBM Support

LO88214: REPEATED ORDEREDLATCH TIMEOUT ERRORS EVEN AFTER THE CAUSE HAS BEEN FIXED

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

APAR status

  • Closed as program error.

Error description

  • 1. Thread A makes the request for something which can only be
    allowed to be handled one at a time.  For this, it starts an
    OrderedLatch which enforces access one at a time only.
    2. Thread B needs to make the same operation, so it latches
    onto the OrderedLatch from #1.
    3. Thread C needs to make the same operation, so it latches
    onto the OrderedLatch from #1 (but behind Thread B).
    4. Thread D needs to make the same operation, so it latches
    onto the Orderedlatch from #1 (but behind Thread C).
    5. Thread B times out.  This should cause both C and D to
    timeout as well, but only C times out.
    6. Thread E needs to make the same operation, so it latches onto
    the OrderedLatch from #1.  D should have been gone already, but
    because it has not, E is waiting on D instead of A as it should
    be.
    7. Thread A finishes and gives up the latch.  At this point, E
    should have been the only waiting thread, gotten the latch, and
    executed the code as the holder of the latch access.  However, E
    is waiting on D and D is waiting on C which doesn't exist any
    longer.  As C doesn't exist, D is going to time out as C (which
    doesn't exist) is never going to signal to D that C is finished.
    
    If another thread (Thread F) needs to make the same operation,
    it is going to get into the pile up of waiting threads that will
    time out because what they are waiting on no longer exists which
    leaves the time out as the only possible conclusion for them.
    
    If the latches were linked correctly (if D had gone away with C
    and E been correctly latched onto A), E would have run as soon
    as A was done and F would have followed E as expected and E and
    F would not have also logged the WARNING message about timing
    out.
    
    Note: This APAR fix only handles time outs better in these
    situations.  The root cause of the problem is whatever is
    holding up A long enough such that B timed out in the first
    place.  Whatever that root cause is should still be investigated
    as it is most likely causing non-optimal user sync experiences,
    but this APAR fix will at least allow Traveler to recover more
    quickly in cases where the root cause is intermittent.
    

Local fix

  • Restart Traveler will clear the pending latches, but the root
    cause of the slowness that caused the latches to time out in the
    first place will quite possibly cause timeouts again until it is
    addressed.
    

Problem summary

  • Repeated latch time out errors reported.
    

Problem conclusion

  • The IBM Traveler server has been updated to handle this scenario
    correctly when using order latches to syncronize threads.
    

Temporary fix

Comments

APAR Information

  • APAR number

    LO88214

  • Reported component name

    LOTUS NOTES TRA

  • Reported component ID

    5724E6204

  • Reported release

    901

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-03-05

  • Closed date

    2016-03-08

  • Last modified date

    2016-03-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    LOTUS NOTES TRA

  • Fixed component ID

    5724E6204

Applicable component levels

  • R901 PSY

       UP



Document information

More support for: IBM Traveler

Software version: 9.0.1

Reference #: LO88214

Modified date: 08 March 2016