| Performance information for the MASS libraries for AIX The following tables provide approximate performance data for the MASS scalar and vector libraries running under AIX on various POWER machines. The columns labelled libm and mass list the results obtained with the libm.a system library and the libmass.a library, respectively. This data was obtained by timing many repetitions of a loop over 1000 random arguments and includes all overheads. Timing in this way brings the input and output vectors into the on-chip cache (because the loop is short enough for the vectors to fit in the cache). Performance may deteriorate significantly when the input and output vectors are not in the cache. Performance may also deteriorate for arguments at or near the end-points of the valid argument ranges. The columns labeled massv, vp4, vp5, and vp6 list the results obtained with the libmassv.a, libmassvp4.a, libmassvp5.a, and libmassvp6.a libraries, respectively. They give estimates of the number of cycles per evaluation of a vector element. The estimates used vectors of length 1000 so that the caches contain all the vectors. Although the vector names (e.g. vacos) are used in the Function column, the libm and mass columns refer to the corresponding scalar function (e.g. acos). The system library measurements were made with the versions of the library available on the test systems. They may vary from the versions timed for previous versions of MASS. Users may experience performance that differs from that found in these tables. Results will vary with vector length. Entries in the table where the library function does not exist, or the measurement was not done, are left blank. The range key at the end of this section applies to all the tables. POWER6 Performance POWER6 libmassvp6.a Performance -- double precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass vp6 mass vp6
==================================================
vacos B 207.45 209.89 27.79 0.99 7.46
vacosh G 909.74 315.96 46.82 2.88 19.43
vasin B 190.48 231.28 27.72 0.82 6.87
vasinh D 707.13 316.86 30.28 2.23 23.35
vatan B 199.42 122.24 1.63
vatan D 209.35 163.78 1.28
vatan2 D 1402.90 174.20 37.15 8.05 37.76
vatanh B 515.45 249.08 26.48 2.07 19.47
vcbrt D 465.86 191.78 16.57 2.43 28.11
vcopysign D 81.70 103.11 0.79
vcos B 147.89 93.09 15.87 1.59 9.32
vcos D 201.36 160.31 16.94 1.26 11.89
vcosh D 451.98 148.04 14.54 3.05 31.09
vcosisin B 314.20 153.68 57.21 2.04 5.49
vcosisin D 427.27 219.00 110.59 1.95 3.86
vdint D 15.68
vdiv D 7.88
vdnint D 333.75 106.77 16.70 3.13 19.99
verf C 174.88 129.52 1.35
verfc C 238.96 189.26 1.26
vexp D 308.19 110.07 11.74 2.80 26.25
vexpm1 D 319.71 123.91 13.46 2.58 23.75
vhypot D 791.58 231.96 3.41
vlgamma H 667.95 387.43 1.72
vlog C 362.77 163.75 12.34 2.22 29.40
vlog10 C 468.10 142.26 12.34 3.29 37.93
vlog1p H 346.44 168.21 18.47 2.06 18.76
vpow C 739.87 297.85 36.44 2.48 20.30
vqdrt C 10.26
vrcbrt D 16.82
vrec D 7.85
vrqdrt C 10.89
vrsqrt C 214.11 148.20 11.17 1.44 19.17
vsin B 149.26 93.14 15.95 1.60 9.36
vsin D 206.34 160.70 16.86 1.28 12.24
vsincos B 276.75 133.54 103.31 2.07 2.68
vsincos D 400.12 169.55 110.48 2.36 3.62
vsinh D 552.82 126.78 15.58 4.36 35.48
vsqrt C 176.05 146.32 11.14 1.20 15.80
vtan D 358.63 133.32 61.13 2.69 5.87
vtanh F 584.75 148.94 22.43 3.93 26.07 *libm routine uses hardware instruction POWER6 libmassvp6.a Performance -- single precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass vp6 mass vp6
==================================================
vsacos B 236.16 190.27 23.49 1.24 10.05
vsacosh G 936.49 289.47 39.96 3.24 23.44
vsasin B 218.83 191.08 22.49 1.15 9.73
vsasinh D 720.21 302.50 25.79 2.38 27.93
vsatan B 229.88 131.22 1.75
vsatan D 243.57 204.41 1.19
vsatan2 D 1431.90 218.79 22.43 6.54 63.84
vsatanh B 526.16 227.70 24.66 2.31 21.34
vscbrt D 481.13 179.58 12.35 2.68 38.96
vscopysign D 109.55 113.27 0.97
vscos B 171.00 113.53 14.23 1.51 12.02
vscos D 230.56 177.10 14.74 1.30 15.64
vscosh D 579.94 161.61 15.58 3.59 37.22
vscosisin B 54.42
vscosisin D 104.33
vsdiv D 7.83
vserf C 200.58 105.43 1.90
vserfc C 278.30 125.64 2.22
vsexp D 388.71 127.43 10.78 3.05 36.06
vsexpm1 D 416.46 143.63 12.32 2.90 33.80
vshypot D 934.26 206.52 4.52
vslgamma H 703.24 358.59 1.96
vslog C 400.68 148.25 10.75 2.70 37.27
vslog10 C 497.09 142.86 10.69 3.48 46.50
vslog1p H 389.21 143.08 13.49 2.72 28.85
vspow C 1043.39 230.82 23.55 4.52 44.31
vsqdrt C 10.15
vsrcbrt D 11.37
vsrec D 5.61
vsrint D 211.34 97.18 2.17
vsrqdrt C 9.94
vsrsqrt C 10.09
vssin B 168.95 118.95 13.64 1.42 12.39
vssin D 232.15 179.46 14.60 1.29 15.90
vssincos B 54.67
vssincos D 104.31
vssinh D 679.49 170.26 16.82 3.99 40.40
vssqrt C 8.85
vstan D 353.55 195.37 58.09 1.81 6.09
vstanh F 581.74 153.27 21.28 3.80 27.34 POWER5 Performance POWER5 libmassvp5.a Performance -- double precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass vp5 mass vp5
==================================================
vacos B 122.40 107.72 22.95 1.14 5.33
vacosh G 591.30 175.51 30.66 3.37 19.29
vasin B 121.08 151.69 22.94 0.80 5.28
vasinh D 487.30 180.82 21.74 2.69 22.41
vatan B 70.78 49.48 1.43
vatan D 118.35 78.45 1.51
vatan2 D 787.62 94.98 24.56 8.29 32.07
vatanh B 310.43 156.44 18.78 1.98 16.53
vcbrt D 330.95 100.39 11.43 3.30 28.95
vcopysign D 37.96 42.07 0.90
vcos B 68.22 33.08 13.15 2.06 5.19
vcos D 87.14 69.09 14.66 1.26 5.94
vcosh D 282.10 56.25 12.72 5.02 22.18
vcosisin B 157.57 71.85 9.30 2.19 16.94
vcosisin D 200.15 129.25 17.79 1.55 11.25
vdint D 6.36
vdiv D 5.38
vdnint D 118.36 48.09 7.72 2.46 15.33
verf C 82.31 57.42 1.43
verfc C 111.44 111.71 1.00
vexp D 172.40 46.87 10.61 3.68 16.25
vexpm1 D 238.04 49.44 11.47 4.81 20.75
vhypot D 643.93 108.88 5.91
vlgamma H 391.16 236.46 1.65
vlog C 215.88 86.74 9.36 2.49 23.06
vlog10 C 325.08 79.44 9.44 4.09 34.44
vlog1p H 220.76 71.64 12.09 3.08 18.26
vpow C 463.26 149.60 29.58 3.10 15.66
vqdrt C 7.56
vrcbrt D 11.32
vrec D 5.00
vrqdrt C 6.84
vrsqrt C 182.82 74.92 6.31 2.44 28.97
vsin B 69.99 34.57 13.19 2.02 5.31
vsin D 88.63 69.09 14.68 1.28 6.04
vsincos B 131.02 46.69 15.66 2.81 8.37
vsincos D 171.33 92.53 17.78 1.85 9.64
vsinh D 357.66 63.43 13.18 5.64 27.14
vsqrt C 139.37 75.50 6.27 1.85 22.23
vtan D 222.71 73.50 17.42 3.03 12.78 *libm routine uses hardware instruction POWER5 libmassvp5.a Performance -- single precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass vp5 mass vp5
==================================================
vsacos B 143.39 88.92 13.96 1.61 10.27
vsacosh G 610.35 160.14 22.59 3.81 27.02
vsasin B 136.26 82.44 13.75 1.65 9.91
vsasinh D 496.50 172.10 17.31 2.88 28.68
vsatan B 81.85 48.10 1.70
vsatan D 131.21 98.11 1.34
vsatan2 D 797.35 147.51 15.50 5.41 51.44
vsatanh B 320.59 109.71 16.57 2.92 19.35
vscbrt D 340.06 89.94 7.80 3.78 43.60
vscopysign D 52.84 87.04 0.61
vscos B 80.61 34.02 10.33 2.37 7.80
vscos D 111.43 71.87 11.89 1.55 9.37
vscosh D 323.33 68.11 12.40 4.75 26.08
vscosisin B 7.25
vscosisin D 14.54
vsdiv D 4.68
vserf C 106.99 38.87 2.75
vserfc C 119.67 64.13 1.87
vsexp D 228.64 58.80 10.18 3.89 22.46
vsexpm1 D 281.13 72.35 10.86 3.89 25.89
vshypot D 697.12 107.04 6.51
vslgamma H 431.48 246.31 1.75
vslog C 213.10 57.08 6.83 3.73 31.20
vslog10 C 345.52 58.30 6.83 5.93 50.59
vslog1p H 234.36 70.87 8.81 3.31 26.60
vspow C 542.71 128.08 19.99 4.24 27.15
vsqdrt C 6.86
vsrcbrt D 7.71
vsrec D 3.71
vsrint D 50.09 64.42 0.78
vsrqdrt C 6.18
vsrsqrt C 5.30
vssin B 82.56 34.51 10.35 2.39 7.98
vssin D 116.39 69.77 11.84 1.67 9.83
vssincos B 7.24
vssincos D 14.56
vssinh D 423.50 68.59 13.06 6.17 32.43
vssqrt C 5.46
vstan D 218.19 83.15 15.39 2.62 14.18
vstanh F 362.76 67.85 15.10 5.35 24.02
vtanh F 363.40 77.29 18.06 4.70 20.12 POWER4+ Performance POWER4+ libmassvp4.a Performance -- double precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass vp4 mass vp4
==================================================
vacos B 115.06 101.62 24.09 1.13 4.78
vacosh G 536.48 175.32 32.30 3.06 16.61
vasin B 109.83 143.17 23.98 0.77 4.58
vasinh D 462.63 182.09 23.39 2.54 19.78
vatan B 65.07 60.10 1.08
vatan D 100.50 89.50 1.12
vatan2 D 715.44 104.95 24.11 6.82 29.67
vatanh B 294.31 133.58 20.42 2.20 14.41
vcbrt D 333.19 102.93 10.63 3.24 31.34
vcopysign D 37.09 36.33 1.02
vcos B 65.48 36.61 13.60 1.79 4.81
vcos D 91.77 66.66 14.89 1.38 6.16
vcosh D 272.80 55.58 13.93 4.91 19.58
vcosisin B 159.73 71.67 10.21 2.23 15.64
vcosisin D 209.38 144.78 17.62 1.45 11.88
vdint D 6.58
vdiv D 5.79
vdnint D 112.80 42.19 8.81 2.67 12.80
verf C 78.75 62.42 1.26
verfc C 105.09 108.70 0.97
vexp D 162.75 46.41 10.81 3.51 15.06
vexpm1 D 216.95 51.41 11.95 4.22 18.15
vhypot D 550.15 101.81 5.40
vlgamma H 368.00 209.47 1.76
vlog C 203.36 85.63 9.85 2.37 20.65
vlog10 C 284.48 82.07 10.15 3.47 28.03
vlog1p H 201.22 78.70 12.66 2.56 15.89
vpow C 449.94 146.70 29.78 3.07 15.11
vqdrt C 7.85
vrcbrt D 10.45
vrec D 5.13
vrqdrt C 7.66
vrsqrt C 163.48 74.76 6.95 2.19 23.52
vsin B 62.60 37.30 13.61 1.68 4.60
vsin D 91.27 66.58 14.84 1.37 6.15
vsincos B 119.41 49.25 16.24 2.42 7.35
vsincos D 175.32 88.17 17.53 1.99 10.00
vsinh D 349.25 58.65 14.76 5.95 23.66
vsqrt C 124.84 74.12 7.12 1.68 17.53
vtan D 200.03 73.33 18.59 2.73 10.76
vtanh F 380.90 72.80 18.57 5.23 20.51 *libm routine uses hardware instruction POWER4+ libmassvp4.a Performance -- single precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass vp4 mass vp4
==================================================
vsacos B 130.37 79.18 14.19 1.65 9.19
vsacosh G 564.96 164.27 24.25 3.44 23.30
vsasin B 123.76 94.03 14.16 1.32 8.74
vsasinh D 482.55 167.74 18.69 2.88 25.82
vsatan B 71.95 41.49 1.73
vsatan D 111.71 89.64 1.25
vsatan2 D 734.10 129.02 14.51 5.69 50.59
vsatanh B 323.16 107.91 17.50 2.99 18.47
vscbrt D 359.67 80.08 7.58 4.49 47.45
vscopysign D 45.04 79.13 0.57
vscos B 67.81 29.57 10.88 2.29 6.23
vscos D 91.88 67.73 12.28 1.36 7.48
vscosh D 364.40 64.55 13.34 5.65 27.32
vscosisin B 8.20
vscosisin D 14.56
vsdiv D 5.24
vserf C 91.93 33.42 2.75
vserfc C 114.37 61.17 1.87
vsexp D 212.16 57.98 10.36 3.66 20.48
vsexpm1 D 278.57 65.83 10.97 4.23 25.39
vshypot D 709.54 101.54 6.99
vslgamma H 497.11 185.17 2.68
vslog C 210.54 55.16 6.73 3.82 31.28
vslog10 C 289.76 54.95 6.81 5.27 42.55
vslog1p H 206.38 71.02 9.27 2.91 22.26
vspow C 554.52 122.42 19.14 4.53 28.97
vsqdrt C 7.35
vsrcbrt D 7.33
vsrec D 4.77
vsrint D 56.16 39.69 1.41
vsrqdrt C 6.76
vsrsqrt C 5.58
vssin B 66.98 35.75 10.92 1.87 6.13
vssin D 92.40 67.51 12.26 1.37 7.54
vssincos B 8.18
vssincos D 14.55
vssinh D 431.54 69.83 14.22 6.18 30.35
vssqrt C 5.70
vstan D 197.23 76.56 14.54 2.58 13.56
vstanh F 378.52 61.20 16.74 6.18 22.61 PowerPC 604e Performance PowerPC 604e libmassv.a Performance -- double precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass massv mass massv
==================================================
vacos B 103.14 111.17 40.58 0.93 2.54
vacosh G 350.27 135.58 46.76 2.58 7.49
vasin B 95.86 123.42 40.46 0.78 2.37
vasinh D 364.40 154.43 44.25 2.36 8.24
vatan B 68.89 64.03 1.08
vatan D 106.62 94.78 1.12
vatan2 D 597.78 103.53 40.33 5.77 14.82
vatanh B 251.65 140.03 41.23 1.80 6.10
vcbrt D 295.03 90.84 19.73 3.25 14.95
vcopysign D 33.03 40.03 0.83
vcos B 67.61 41.32 9.28 1.64 7.29
vcos D 89.70 59.95 26.64 1.50 3.37
vcosh D 201.22 58.67 24.93 3.43 8.07
vcosisin B 78.04 19.00
vcosisin D 109.94 28.75
vdint D 6.26
vdiv D 11.46
vdnint D 44.02 6.77
verf C 72.04 52.09 1.38
verfc C 53.70 93.58 0.57
vexp D 129.80 50.09 16.06 2.59 8.08
vexpm1 D 145.63 53.37 19.74 2.73 7.38
vhypot D 366.89 98.45 3.73
vlgamma H 301.86 194.67 1.55
vlog C 145.65 66.72 19.29 2.18 7.55
vlog10 C 212.18 77.03 19.43 2.75 10.92
vlog1p H 169.01 72.04 25.81 2.35 6.55
vpow C 336.49 124.39 50.96 2.71 6.60
vqdrt C 14.81
vrcbrt D 19.44
vrec D 10.03
vrqdrt C 13.94
vrsqrt C 114.92 63.80 16.03 1.80 7.17
vsin B 66.01 41.32 10.48 1.60 6.30
vsin D 92.43 59.95 26.60 1.54 3.47
vsincos B 128.97 66.04 19.37 1.95 6.66
vsincos D 177.44 94.67 29.19 1.87 6.08
vsinh D 260.16 62.91 26.55 4.14 9.80
vsqrt C 80.94 61.82 16.01 1.31 5.06
vtan D 205.24 87.72 31.34 2.34 6.55
vtanh F 308.29 79.59 34.38 3.87 8.97 *libm routine uses hardware instruction PowerPC 604e libmassv.a Performance -- single precision functions (cycles per evaluation, length 1000 loop) libm/ libm/
Function Range libm mass massv mass massv
==================================================
vsacos B 154.65 110.51 26.72 1.40 5.79
vsacosh G 151.88 37.53
vsasin B 145.87 110.00 27.02 1.33 5.40
vsasinh D 165.78 36.68
vsatan B 120.23 84.02 1.43
vsatan D 156.87 127.21 1.23
vsatan2 D 655.98 157.48 39.96 4.17 16.42
vsatanh B 142.04 35.51
vscbrt D 101.72 14.10
vscopysign D 71.53
vscos B 111.62 66.02 7.24 1.69 15.42
vscos D 133.71 98.39 20.11 1.36 6.65
vscosh D 252.05 90.71 21.06 2.78 11.97
vscosisin B 15.00
vscosisin D 23.12
vsdiv D 9.83
vserf C 67.88
vserfc C 81.33
vsexp D 188.01 82.18 13.15 2.29 14.30
vsexpm1 D 82.80 16.26
vshypot D 123.85
vslgamma H 208.64
vslog C 190.66 89.03 15.76 2.14 12.10
vslog10 C 257.18 89.03 15.71 2.89 16.37
vslog1p H 99.03 20.28
vspow C 399.93 129.85 29.93 3.08 13.36
vsqdrt C 13.20
vsrcbrt D 14.10
vsrec D 7.83
vsrint D 48.19
vsrqdrt C 12.44
vsrsqrt C 9.08
vssin B 110.01 68.02 8.37 1.62 13.14
vssin D 136.43 92.56 20.12 1.47 6.78
vssincos B 14.75
vssincos D 23.62
vssinh D 325.72 86.63 23.77 3.76 13.70
vssqrt C 9.67
vstan D 207.83 123.59 25.89 1.68 8.03
vstanh F 311.24 84.75 28.02 3.67 11.11 Range Key A 0,1 B -1,1 C 0,100 D -100,100 E -10,10 F -20,20 G 1,100 H -1,100 I 0,10 |