Question & Answer
Question
The built in AVG function in NPS (Netezza Perfomance Server, the Postgres based RDBMS that is part of IBM's PureData System for Analytics appliance returns a NUMERIC(38,6). How can I get NPS to return the average of a set of values that is a NUMERIC(28,14) ?
Cause
If column col2 contains the values 1, 1, and 2, the average of col2 is 1 and 1/3, and AVG(col2) seems to return that in a NUMERIC(38,6).
select avg(col2) from onerow ;
AVG
----------
1.333333
(1 row)
However, my application is expecting a NUMERIC(28,14). When I cast the output of AVG to a NUMERIC(28,14) I do not get the .3 recurring which is the decimal equivalent of 1/3, but the previous result padded with zeros after the 6th decimal place:
select avg(col2)::NUMERIC(28,14) from onerow ;
?COLUMN?
------------------
1.33333300000000
(1 row)
P3 = min(38, (P1 + P2 + S2 + 1))
S3 = max(6, (S1 + P2 + 1))
The actual code inside NPS is more complicated than this, as for small input precisions and scales the +1 is removed, and when (P1 + P2 + S2 + 1) is more than 38, we reduced P3 to 38 and then reduce S3 by the same amount to keep the overall storage the same, unless that makes S3 less than 6 ... but the formulas above are a good general guideline.
So, as the SUM function returns a NUMERIC(38,0), and the COUNT function returns a BIGINT(aka INT8) which has 19 digits of precision and 0 of scale, when we calculate SUM / COUNT the formulas above tell us
P3 = min(38, (38 + 19 + 0 + 1)) = 58
S3 = max(6, (0 + 19 + 1)) = 20
As we can't have a NUMERIC(58,20), the 58 gets reduced (by 20) to the maximum allowed precision (38), and the scale gets reduced by the same amount to compensate. That makes the scale 0, and as that is less than 6, the scale is set to 6. This is why the AVG function returns a NUMERIC(38,6). Casting the output of AVG into a NUMERIC(28,14) will just pad it with another 8 zeros because the precision has already been lost.
To have the output of a division like SUM/COUNT return a different precision and/or scale, we need to change the precision and/or scale of the inputs.
Answer
These divisions all result in a NUMERIC(28,14):
NUMERIC(24,10) / BYTEINT
NUMERIC(22,8) / SMALLINT
NUMERIC(18,4) / INTEGER
NUMERIC(14,0) / NUMERIC(13,0)
Which to use depends on how many values you are averaging, and how big those values are.
If you are averaging fewer that 256 values, the count can be stored in a BYTEINT (aka TINYINT
or INT1, which has precision of 3) and the sum can be up to 99,999,999,999,999.9999999999 as it
has to fit into a NUMERIC(24,10).
If you have up to 32767 values to average, you'll need SMALLINT (aka INT2) for the count. That
has a precision of 5, so for the average to be a NUMERIC (28,14), the sums has to be a NUMERIC(22,8)
And if you have up to 2,147,483,647 values, you'll need an INTEGER (aka INT4 or INT) for the count
and. That has 10 digits of precision so your sum has to be a NUMERIC(18,4).
Note that all the sum datatypes above have 14 digits to the left of the decimal point, and the greater
precision of the counts is enabled by taking away decimal places to the right of the decimal point.
If you're averaging whole numbers only, you can use a NUMERIC(13,0) for your count, but your sum
is limited further to a NUMERIC(14,0).
There is one final integer datatype I haven't suggested to hold the count: BIGINT (aka INT8). I can't
get a NUMERIC(28,14) when dividing by an 8 byte integer. The closest I can come is a NUMERIC(28,19)
if the sum is a NUMERIC(9,0). This would enable you to average up to 9,223,372,036,854,775,807
values, but they'd have to sum to no more than 999,999,999. That means if you had anything like that
many values, the overwhelming majority of them would have to be very small fractions for the sum of
9.2 * 10^18 values to be less than 9.9 * 10^8, and we'd only be able to store the integer part of the sum
anyway, which would then make the result less accurate.
The following SQL demonstrates all of the above. Note especially that avgi8 and avg13 return
1.33333... whereas the rest of the average columns return 1.4666... This is because summing 1.1, 1.1
and 2.2 into a numeric(9,0) or numeric(14,0) rounds 4.4 down to 4.0 - more digits of precision in the divisor
does not necessarily mean a more precise output.
create table avgvals( inval numeric(22,11) ) ;
-- create table avgvals( inval integer ) ;
insert into avgvals values (1.1) ;
insert into avgvals values (1.1) ;
insert into avgvals values (2.2) ;
create table avgs as
select sum(inval) as defsum
, count(*) as defcnt
, avg(inval) as defavg
, avg(inval)::numeric(28,14) as castavg
, sum(inval)::numeric(24,10) as sum2410
, count(*)::int1 as cntint1
, (sum2410 / cntint1) as avgi1
, sum(inval)::numeric(22,8) as sum228
, count(*)::int2 as cntint2
, (sum228 / cntint2) as avgi2
, sum(inval)::numeric(18,4) as sum184
, count(*)::int4 as cntint4
, (sum184 / cntint4) as avgi4
, sum(inval)::numeric(9) as sum90
, count(*)::int8 as cntint8
, (sum90 / cntint8) as avgi8
, sum(inval)::numeric(14) as sum14
, count(*)::numeric(13) as cntnum13
, (sum14 / cntnum13) as avg13
from avgvals ;
\x
select * from avgs ;
\x
\d avgs
Output
======
select * from avgs ;
-[ RECORD 1 ]-------------------
DEFSUM | 4.40000000000
DEFCNT | 3
DEFAVG | 1.466667
CASTAVG | 1.46666700000000
SUM2410 | 4.4000000000
CNTINT1 | 3
AVGI1 | 1.46666666666667
SUM228 | 4.40000000
CNTINT2 | 3
AVGI2 | 1.46666666666667
SUM184 | 4.4000
CNTINT4 | 3
AVGI4 | 1.46666666666667
SUM90 | 4
CNTINT8 | 3
AVGI8 | 1.3333333333333333333
SUM14 | 4
CNTNUM13 | 3
AVG13 | 1.33333333333333
Table "AVGS"
Attribute | Type | Modifier | Default Value
-----------+----------------+----------+---------------
DEFSUM | NUMERIC(38,11) | |
DEFCNT | BIGINT | |
DEFAVG | NUMERIC(38,6) | |
CASTAVG | NUMERIC(28,14) | |
SUM2410 | NUMERIC(24,10) | |
CNTINT1 | BYTEINT | |
AVGI1 | NUMERIC(28,14) | |
SUM228 | NUMERIC(22,8) | |
CNTINT2 | SMALLINT | |
AVGI2 | NUMERIC(28,14) | |
SUM184 | NUMERIC(18,4) | |
CNTINT4 | INTEGER | |
AVGI4 | NUMERIC(28,14) | |
SUM90 | NUMERIC(9,0) | |
CNTINT8 | BIGINT | |
AVGI8 | NUMERIC(28,19) | |
SUM14 | NUMERIC(14,0) | |
CNTNUM13 | NUMERIC(13,0) | |
AVG13 | NUMERIC(28,14) | |
__________________________________
William Byrne III
Product Synonym
PDA;Pure Data for Analytics
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21669116