A fix is available
APAR status
Closed as program error.
Error description
In some cases when the CM is retransmitting (say a DREQ etc.) there are scenarios where we can have a thundering herd issue. Since a lot of DREQs were transmissted, when they time out (due to other side not being there) the code will retransmit. This happens at the same time, in a loop, on the ib_mad thread. This is the same thread that hadnles the MAD completions which will replenish the MAD send queue. That means there will be no replenishment during the retries and it is possible to exhaust the send queue momentarily during this phase. At that time CM operations (sending from other threads for example) can get affected.
Local fix
Problem summary
In some cases when the CM is retransmitting (say a DREQ etc.) there are scenarios where we can have a thundering herd issue. Since a lot of DREQs were transmissted, when they time out (due to other side not being there) the code will retransmit. This happens at the same time, in a loop, on the ib_mad thread. This is the same thread that hadnles the MAD completions which will replenish the MAD send queue. That means there will be no replenishment during the retries and it is possible to exhaust the send queue momentarily during this phase. At that time CM operations (sending from other threads for example) can get affected.
Problem conclusion
Increase the MAD buffer send queue size to 512
Temporary fix
Comments
6100-09 - use AIX APAR IV56803 6100-09 - use AIX APAR IV56803 6100-09 - use AIX APAR IV56803 7100-03 - use AIX APAR IV56851 7100-04 - use AIX APAR IV56899
APAR Information
APAR number
IV56899
Reported component name
AIX V7.1
Reported component ID
5765H4000
Reported release
710
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Submitted date
2014-03-20
Closed date
2014-03-20
Last modified date
2016-05-10
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
AIX V7.1
Fixed component ID
5765H4000
Applicable component levels
R710 PSY U861782
UP15/11/23 I 1000
PTF to Fileset Mapping
U861782 ofed.core.rte 7.1.4.0
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"AIX 7.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
10 May 2016