IBM Support

IV56851: CONNECTION MGR OPERATIONS MAY FAIL DUE TO EXHAUSTED SEND BUFF APPLIES TO AIX 7100-03

A fix is available

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • In some cases when the CM is retransmitting (say a DREQ etc.)
    there are scenarios where we can have a thundering herd issue.
    Since a lot of DREQs were transmissted, when they time out (due
    to other side not being there) the code will retransmit. This
    happens at the same time, in a loop, on the ib_mad thread. This
    is the same thread that hadnles the MAD completions which will
    replenish the MAD send queue. That means there will be no
    replenishment during the retries and it is possible to exhaust
    the send queue momentarily during this phase. At that time
    CM operations (sending from other threads for example) can
    get affected.
    

Local fix

Problem summary

  • In some cases when the CM is retransmitting (say a DREQ etc.)
    there are scenarios where we can have a thundering herd issue.
    Since a lot of DREQs were transmissted, when they time out (due
    to other side not being there) the code will retransmit. This
    happens at the same time, in a loop, on the ib_mad thread. This
    is the same thread that hadnles the MAD completions which will
    replenish the MAD send queue. That means there will be no
    replenishment during the retries and it is possible to exhaust
    the send queue momentarily during this phase. At that time
    CM operations (sending from other threads for example) can
    get affected.
    

Problem conclusion

  • Increase the MAD buffer send queue size to 512
    

Temporary fix

Comments

  • 6100-09 - use AIX APAR IV56803
    6100-09 - use AIX APAR IV56803
    6100-09 - use AIX APAR IV56803
    7100-03 - use AIX APAR IV56851
    7100-04 - use AIX APAR IV56899
    

APAR Information

  • APAR number

    IV56851

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2014-03-19

  • Closed date

    2014-03-19

  • Last modified date

    2016-05-10

  • APAR is sysrouted FROM one or more of the following:

    IV56803

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U858987

       UP14/05/22 I 1000

PTF to Fileset Mapping



Document information

More support for: AIX Enterprise Edition

Software version: 710

Operating system(s): AIX

Reference #: IV56851

Modified date: 10 May 2016