IBM Support

Getting the results of mathematical calculations of PureData System for Analytics NPS to be returned in a specific precision and scale

Question & Answer


Question

The built in AVG function in NPS (Netezza Perfomance Server, the Postgres based RDBMS that is part of IBM's PureData System for Analytics appliance returns a NUMERIC(38,6). How can I get NPS to return the average of a set of values that is a NUMERIC(28,14) ?

Cause

If column col2 contains the values 1, 1, and 2, the average of col2 is 1 and 1/3, and AVG(col2) seems to return that in a NUMERIC(38,6).


    select avg(col2) from onerow ;
      AVG
    ----------
    1.333333
    (1 row)

However, my application is expecting a NUMERIC(28,14). When I cast the output of AVG to a NUMERIC(28,14) I do not get the .3 recurring which is the decimal equivalent of 1/3, but the previous result padded with zeros after the 6th decimal place:

    select avg(col2)::NUMERIC(28,14) from onerow ;
        ?COLUMN?
    ------------------
    1.33333300000000
    (1 row)
To get a NUMERIC(28,14) output to be accurate for all 14 decimal places, you have to ensure that the calculation returns that datatype. An average is SUM(values) divided by COUNT(values), and in NPS the precision and scale of the result of a division operation is a function of the precision and scale of the numerator and denominator. If you divide a number (with precision and scale of P1 and S1) by a number (with precision and scale of P2 and S2) then the precision and scale of the result (P3 and S3) is calculated as follows:
P3 = min(38, (P1 + P2 + S2 + 1))
S3 = max(6, (S1 + P2 + 1))

The actual code inside NPS is more complicated than this, as for small input precisions and scales the +1 is removed, and when (P1 + P2 + S2 + 1) is more than 38, we reduced P3 to 38 and then reduce S3 by the same amount to keep the overall storage the same, unless that makes S3 less than 6 ... but the formulas above are a good general guideline.

So, as the SUM function returns a NUMERIC(38,0), and the COUNT function returns a BIGINT(aka INT8) which has 19 digits of precision and 0 of scale, when we calculate SUM / COUNT the formulas above tell us

P3 = min(38, (38 + 19 + 0 + 1)) = 58
S3 = max(6, (0 + 19 + 1)) = 20

As we can't have a NUMERIC(58,20), the 58 gets reduced (by 20) to the maximum allowed precision (38), and the scale gets reduced by the same amount to compensate. That makes the scale 0, and as that is less than 6, the scale is set to 6. This is why the AVG function returns a NUMERIC(38,6). Casting the output of AVG into a NUMERIC(28,14) will just pad it with another 8 zeros because the precision has already been lost.

To have the output of a division like SUM/COUNT return a different precision and/or scale, we need to change the precision and/or scale of the inputs.

Answer


These divisions all result in a NUMERIC(28,14):

NUMERIC(24,10) / BYTEINT
NUMERIC(22,8) / SMALLINT
NUMERIC(18,4) / INTEGER
NUMERIC(14,0) / NUMERIC(13,0)

Which to use depends on how many values you are averaging, and how big those values are.

If you are averaging fewer that 256 values, the count can be stored in a BYTEINT (aka TINYINT
or INT1, which has precision of 3) and the sum can be up to 99,999,999,999,999.9999999999 as it
has to fit into a NUMERIC(24,10).

If you have up to 32767 values to average, you'll need SMALLINT (aka INT2) for the count. That
has a precision of 5, so for the average to be a NUMERIC (28,14), the sums has to be a NUMERIC(22,8)

And if you have up to 2,147,483,647 values, you'll need an INTEGER (aka INT4 or INT) for the count
and. That has 10 digits of precision so your sum has to be a NUMERIC(18,4).

Note that all the sum datatypes above have 14 digits to the left of the decimal point, and the greater
precision of the counts is enabled by taking away decimal places to the right of the decimal point.
If you're averaging whole numbers only, you can use a NUMERIC(13,0) for your count, but your sum
is limited further to a NUMERIC(14,0).

There is one final integer datatype I haven't suggested to hold the count: BIGINT (aka INT8). I can't
get a NUMERIC(28,14) when dividing by an 8 byte integer. The closest I can come is a NUMERIC(28,19)
if the sum is a NUMERIC(9,0). This would enable you to average up to 9,223,372,036,854,775,807
values, but they'd have to sum to no more than 999,999,999. That means if you had anything like that
many values, the overwhelming majority of them would have to be very small fractions for the sum of
9.2 * 10^18 values to be less than 9.9 * 10^8, and we'd only be able to store the integer part of the sum
anyway, which would then make the result less accurate.

The following SQL demonstrates all of the above. Note especially that avgi8 and avg13 return
1.33333... whereas the rest of the average columns return 1.4666... This is because summing 1.1, 1.1
and 2.2 into a numeric(9,0) or numeric(14,0) rounds 4.4 down to 4.0 - more digits of precision in the divisor
does not necessarily mean a more precise output.


  create table avgvals( inval  numeric(22,11) ) ;
-- create table avgvals( inval  integer ) ;
  insert into avgvals values (1.1) ;
  insert into avgvals values (1.1) ;
  insert into avgvals values (2.2) ;

  create table avgs as
  select  sum(inval)                 as defsum
        , count(*)                   as defcnt
        , avg(inval)                 as defavg
        , avg(inval)::numeric(28,14) as castavg
        , sum(inval)::numeric(24,10) as sum2410
        , count(*)::int1             as cntint1
        , (sum2410 / cntint1)        as avgi1
        , sum(inval)::numeric(22,8)  as sum228
        , count(*)::int2             as cntint2
        , (sum228 / cntint2)         as avgi2
        , sum(inval)::numeric(18,4)  as sum184
        , count(*)::int4             as cntint4
        , (sum184 / cntint4)         as avgi4
        , sum(inval)::numeric(9)     as sum90
        , count(*)::int8             as cntint8
        , (sum90 / cntint8)          as avgi8
        , sum(inval)::numeric(14)    as sum14
        , count(*)::numeric(13)      as cntnum13
        , (sum14 / cntnum13)         as avg13
    from avgvals ;

  \x
  select * from avgs ;
  \x

  \d avgs


Output
======
select * from avgs ;
-[ RECORD 1 ]-------------------
DEFSUM   | 4.40000000000
DEFCNT   | 3
DEFAVG   | 1.466667
CASTAVG  | 1.46666700000000
SUM2410  | 4.4000000000
CNTINT1  | 3
AVGI1    | 1.46666666666667
SUM228   | 4.40000000
CNTINT2  | 3
AVGI2    | 1.46666666666667
SUM184   | 4.4000
CNTINT4  | 3
AVGI4    | 1.46666666666667
SUM90    | 4
CNTINT8  | 3
AVGI8    | 1.3333333333333333333
SUM14    | 4
CNTNUM13 | 3
AVG13    | 1.33333333333333


                     Table "AVGS"
 Attribute |      Type      | Modifier | Default Value
-----------+----------------+----------+---------------
 DEFSUM    | NUMERIC(38,11) |          |
 DEFCNT    | BIGINT         |          |
 DEFAVG    | NUMERIC(38,6)  |          |
 CASTAVG   | NUMERIC(28,14) |          |
 SUM2410   | NUMERIC(24,10) |          |
 CNTINT1   | BYTEINT        |          |
 AVGI1     | NUMERIC(28,14) |          |
 SUM228    | NUMERIC(22,8)  |          |
 CNTINT2   | SMALLINT       |          |
 AVGI2     | NUMERIC(28,14) |          |
 SUM184    | NUMERIC(18,4)  |          |
 CNTINT4   | INTEGER        |          |
 AVGI4     | NUMERIC(28,14) |          |
 SUM90     | NUMERIC(9,0)   |          |
 CNTINT8   | BIGINT         |          |
 AVGI8     | NUMERIC(28,19) |          |
 SUM14     | NUMERIC(14,0)  |          |
 CNTNUM13  | NUMERIC(13,0)  |          |
 AVG13     | NUMERIC(28,14) |          |

__________________________________
William Byrne III

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":null,"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"All Editions","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Product Synonym

PDA;Pure Data for Analytics

Document Information

Modified date:
17 October 2019

UID

swg21669116