IBM Support

LI77662: SUBOPTIMAL CODE FOR VECTOR LONG LONG SUBTRACT

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The code generated by the compiler could be further improved
    for the below test case.
    
    $cat vecadd64.C
    extern "C" vector unsigned long long sub64(vector unsigned long
    long a,
    vector unsigned long long b)
    {
      return a - b;
    }
    
    Command line:
    xlC -q64 vecadd64.C -O2 -qarch=pwr7 -qaltivec -qlist
    
    Actual compiler output:
         | 000000                           PDEF     sub64
       33|                                  PROC      a,b,vs34,vs35
       35| 0013D0 xxlnor   F0031D17   1     VNOR      vs32=vs35,vs35
       35| 0013D4 ld       E8620010   1     L8
    gr3=.+CONSTANT_AREA
    (gr2,0)
       35| 0013D8 addi     38000080   1     LI        gr0=128
       35| 0013DC addi     38800088   1     LI        gr4=136
       35| 0013E0 lxvdsx   7C230299   1     VLDS
    vs33=+CONSTANT_AREA
    (gr3,gr0,0)
       35| 0013E4 lxvdsx   7C032298   1     VLDS
    vs0=+CONSTANT_AREA
    (gr3,gr4,0)
       35| 0013E8 vadduwm  10600880   1     VADDUWM   vs35=vs32,vs33
       35| 0013EC vaddcuw  10000980   1     VADDCUW   vs32=vs32,vs33
       35| 0013F0 xxsldwi  F0000117   1     VSLDWI
    vs32=vs32,vs32,1
       35| 0013F4 vadduwm  10001880   1     VADDUWM   vs32=vs32,vs35
       35| 0013F8 vadduwm  10201080   1     VADDUWM   vs33=vs32,vs34
       35| 0013FC vaddcuw  10001180   1     VADDCUW   vs32=vs32,vs34
       35| 001400 xxsldwi  F0200116   1     VSLDWI
    vs1=vs32,vs32,1
       35| 001404 xxland   F0000C11   1     VAND      vs32=vs0,vs1
       35| 001408 vadduwm  10400880   1     VADDUWM   vs34=vs32,vs33
       36| 00140C bclr     4E800020   1     BA        lr
         |               Tag Table
         | 001410        00000000 00092200 00000000 00000040
         |               Instruction count           16
         |               Straight-line exec time     16
    
    
    The compiler could consider generating the following to save 7
    instructions:
    
         | 000000                           PDEF     sub64_opt
       38|                                  PROC      a,b,vs34,vs35
       42| 001420 vcmpgtuw 10231286   1     VCMPGTUW  vs33=vs35,vs34
       46| 001424 addi     38000090   1     LI        gr0=144
       43| 001428 vsubuwm  10621C80   1     VSUBUWM   vs35=vs34,vs35
       46| 00142C ld       E8620010   1     L8
    gr3=.+CONSTANT_AREA
    (gr2,0)
       46| 001430 xxlxor   F00004D7   1     VXOR      vs32=vs32,vs32
       46| 001434 lxvd2x   7C430699   1     VLQD
    vs34=+CONSTANT_AREA
    (gr3,gr0,0)
       46| 001438 vperm    100100AB   1     VPERM
    vs32=vs33,vs32,vs34
       47| 00143C vadduwm  10401880   1     VADDUWM   vs34=vs32,vs35
       49| 001440 bclr     4E800020   1     BA        lr
         |               Tag Table
         | 001444        00000000 00092000 00000000 00000024
         |               Instruction count            9
         |               Straight-line exec time      9
    

Local fix

  • The following code would generate optimal binary code:
    
    extern "C" vector unsigned long long sub64_opt(vector unsigned
    long
    long a, vector unsigned long long b)
    {
      vector unsigned int ai = (vector unsigned int)a;
      vector unsigned int bi = (vector unsigned int)b;
      vector unsigned int ov = (vector unsigned
    int)vec_cmpgt(bi,ai);
      vector unsigned int diff = ai - bi;
      vector unsigned int vn = { 0, 0, 0, 0 };
      vector unsigned char vp =
    {0x07,0x07,0x07,0x7,0x1F,0x1F,0x1F,0x1F,0xF,
    0xF,0xF,0xF,0x1F,0x1F,0x1F,0x1F};
      ov = vec_perm(ov,vn,vp);
      diff = diff + ov;
      return (vector unsigned long long)diff;
    }
    

Problem summary

  • PROBLEM DESCRIPTION: Inefficient code generated for vector
    unsigned 64-bit subtract
    
    USERS AFFECTED: Users of V12/14 with -qarch=pwr7 and up with
    -qaltivec with code using vec_sub for vector unsigned long long
    types
    

Problem conclusion

  • Code generation for vector subtraction is improved by reducing
    number of generated instructions. Apply provided service.
    

Temporary fix

Comments

APAR Information

  • APAR number

    LI77662

  • Reported component name

    XL C/C++ FOR LI

  • Reported component ID

    5725C7300

  • Reported release

    C10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-10-28

  • Closed date

    2013-10-28

  • Last modified date

    2013-10-28

  • APAR is sysrouted FROM one or more of the following:

    IV37235

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    XL C/C++ FOR LI

  • Fixed component ID

    5725C7300

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSXVZZ","label":"XL C\/C++ for Linux"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"12.1","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
14 October 2021