IBM Support

IV58828: RDMA DEVICE STOPS WORKING DUE TO EEH APPLIES TO AIX 6100-09

A fix is available

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • The customer may see their application stops running and have
    the following in the errlog:
    
    LABEL:          MLXENT_EEH_DETECT IDENTIFIER:     B81629A3
    
    Date/Time:       Sat Mar 29 13:32:04 CDT 2014 Sequence Number:
    6391 Machine Id:      00F64BB44C00 Node Id:         psmem11
    Class:           H Type:            TEMP WPAR:
    Global Resource Name:   ent2 Resource Class:  adapter Resource
    Type:   mlx Location:        U5877.001.00H2713-P1-C10-T1-L1
    
    VPD:
          PCIe2 2-Port 10GbE RoCE SFP+ Adapter:
            Part Number.................00E1493 EC
            Level....................D77286 FRU
            Number..................00E1493 Serial
            Number...............00E1493YA50AG33K008 Manufacture
            ID..............11211472280147 Network
            Address.............0002C92C7CE3 ROM
            Level.(alterable).......000200091316
    
    Description EEH freeze detected
    
            Recommended Actions PERFORM PROBLEM DETERMINATION
            PROCEDURES
    
    Detail Data FILE NAME line: 271 file: entcore_eeh.c MAC ADDRESS
    0002 C92C 7CE3 DEVICE DRIVER INTERNAL STATE 0000 0000 4000 0000
    0000 0000 0000 0002 0000 0000 4000 0000 PCI ETHERNET STATISTICS
    0061 091B 0000 0000 014F F413 0000 0000 549E 4E96 0000 0000
    0110 8282 0000 0000 0004 A9A4 0000 0000 01D6 513A 0000 0000
    0000 1387 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 TRACE RECORD SEQUENCE
    NUMBER e:0 l:271 f:entcore_eeh_timer_handler
    r:0xEEEE0000859B0674 s:0 o:0 SENSE DATA
    

Local fix

Problem summary

  • The customer may see their application stops running and have
    the following in the errlog:
    
    LABEL:          MLXENT_EEH_DETECT IDENTIFIER:     B81629A3
    
    Date/Time:       Sat Mar 29 13:32:04 CDT 2014 Sequence Number:
    6391 Machine Id:      00F64BB44C00 Node Id:         psmem11
    Class:           H Type:            TEMP WPAR:
    Global Resource Name:   ent2 Resource Class:  adapter Resource
    Type:   mlx Location:        U5877.001.00H2713-P1-C10-T1-L1
    
    VPD:
          PCIe2 2-Port 10GbE RoCE SFP+ Adapter:
            Part Number.................00E1493 EC
            Level....................D77286 FRU
            Number..................00E1493 Serial
            Number...............00E1493YA50AG33K008 Manufacture
            ID..............11211472280147 Network
            Address.............0002C92C7CE3 ROM
            Level.(alterable).......000200091316
    
    Description EEH freeze detected
    
            Recommended Actions PERFORM PROBLEM DETERMINATION
            PROCEDURES
    
    Detail Data FILE NAME line: 271 file: entcore_eeh.c MAC ADDRESS
    0002 C92C 7CE3 DEVICE DRIVER INTERNAL STATE 0000 0000 4000 0000
    0000 0000 0000 0002 0000 0000 4000 0000 PCI ETHERNET STATISTICS
    0061 091B 0000 0000 014F F413 0000 0000 549E 4E96 0000 0000
    0110 8282 0000 0000 0004 A9A4 0000 0000 01D6 513A 0000 0000
    0000 1387 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 TRACE RECORD SEQUENCE
    NUMBER e:0 l:271 f:entcore_eeh_timer_handler
    r:0xEEEE0000859B0674 s:0 o:0 SENSE DATA
    

Problem conclusion

  • Make correct calculation of QP dma size.
    

Temporary fix

Comments

  • 6100-09 - use AIX APAR IV58828
    6100-09 - use AIX APAR IV58828
    6100-09 - use AIX APAR IV58828
    7100-03 - use AIX APAR IV58847
    7100-04 - use AIX APAR IV58945
    

APAR Information

  • APAR number

    IV58828

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2014-04-07

  • Closed date

    2014-04-07

  • Last modified date

    2016-05-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IV58847 IV58945

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U859269

       UP14/05/21 I 1000

PTF to Fileset Mapping



Document information

More support for: AIX Standard Edition

Software version: 610

Operating system(s): AIX

Reference #: IV58828

Modified date: 10 May 2016