IBM Support

Consistency of AIX MASS vector library functions

Product Documentation


Abstract

In the interest of speed, certain MASS vector functions may produce slightly different results for a given input value, depending on its position in the vector, the vector length, and nearby elements of the input vector. In most applications, these inconsistencies will not have any significant impact.

Content

All the functions in the AIX POWER7 MASS vector library (libmassvp7.a) are consistent. Information relating to the consistency of AIX MASS vector functions for other processors is given below.

In the interest of speed, the MASS libraries make certain trade-offs. One of these involves the consistency of certain MASS vector routines. It is possible that the result computed for a particular input value will vary slightly (usually only in the least significant bit) depending on its position in the vector, the vector length, and nearby elements of the input vector. Also, the results produced by the different MASS libraries are not necessarily bit-wise identical. Although most users will likely not be disturbed by this, the behavior is described here to allow you to determine whether it affects your application. (One application that can be affected is the debugging of parallel programs by varying the number of processors and/or the distribution of the data across the processors, while expecting the results to be bit-wise identical.)

The vector MASS routines contain a main loop that computes K (K=4 or 8, depending on the target machine) elements of the output vector per iteration. If the number of input vector elements N is not a multiple of K, a "tail" loop is used to compute the remaining N mod K elements. In the interest of speed, the algorithm used for the tail is not necessarily the same as the one used by the main loop. A consequence of this is that, for certain input values, a slightly different result may be computed, depending on whether the value occurs in the last N mod K elements of the input vector (and hence is computed by the tail loop) or whether it occurs in previous elements (and hence is computed by the main loop).

Also in the interest of speed, certain special (such as extremely small or large) input argument values can cause a block of K results to be re-computed with a different algorithm. A possible consequence is that the same input value may produce slightly different results, depending on whether a nearby input value is special. This should rarely happen for most applications.

Inconsistency can be avoided (assuming no extreme arguments) by always calling the vector MASS routines with a vector length that is a multiple of 8, and padding any final unused positions with a dummy value if necessary. For long vectors, the overhead will be negligible compared to the time required to compute the entire vector. Note, however, that it is still possible to get inconsistent results if there are any extreme values in the input vector.

Beginning with MASS version 3.3, some routines in libmassvp4.a and later vector libraries are consistent. The consistent routines are as follows:

Version 3.3: vsqrt, vssqrt, vexp, vsexp, vlog, vrec, vdiv, vsin, vcos

Version 3.4 and higher: vsqrt, vssqrt, vexp, vsexp, vlog, vrec, vdiv, vsin, vcos, vacos, vasin, vatan2, vrsqrt, vscos, vsdiv, vsrec, vssin

For long vectors, most of the consistent routines run at the same speed as the routines they replace. (Exceptions are vsin, vcos, vssin, and vscos, for which the inconsistent versions can be faster on some vectors having arguments between approximately 0.78 and 1.) There are also some differences in speed for short vectors (see below). (vsin and vcos are not compared in the table since POWER4-tuned versions were not present prior to MASS version 3.3.)

Average relative elapsed time difference (percent) between MASS version 3.3 (consistent) and version 3.2.1 (non-consistent) routines in libmassvp4.a. (Negative means v3.3 is faster.)



n=1007

n=7

vsqrt

0

+33

vssqrt

0

-5
vexp0-8
vlog0+18
vrec0+41
vdiv0+19

[{"Product":{"code":"SSVKBV","label":"Mathematical Acceleration Subsystem"},"Business Unit":{"code":"BU050","label":"BU NOT IDENTIFIED"},"Component":"Libraries","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All","Edition":"","Line of Business":{"code":"","label":""}},{"Product":{"code":"SSJT9L","label":"XL C\/C++"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":" ","Platform":[{"code":"PF002","label":"AIX"},{"code":"","label":"AIX5L"},{"code":"","label":"AIXL"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SSB259","label":"XL Fortran Advanced Edition for Linux"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":null,"Platform":[{"code":"PF002","label":"AIX"},{"code":"","label":"AIXL"}],"Version":"All versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Product":{"code":"SSTJ5T","label":"XL C Enterprise Edition for AIX"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":null,"Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SUPPORT","label":"IBM Worldwide Support"},"Business Unit":{"code":"BU051","label":"N\/A"},"Component":null,"Platform":[{"code":"PF002","label":"AIX"}],"Version":"All versions","Edition":"","Line of Business":{"code":"LOB33","label":"N\/A"}}]

Document Information

Modified date:
12 October 2022

UID

swg27005373