Re: Sum aggregate calculation for single precsion real

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sum aggregate calculation for single precsion real
Date: 2017-02-13 16:45:45
Message-ID: fd9e577d-6f58-134b-79aa-61d96ee52183@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13.02.2017 19:20, Tom Lane wrote:
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>> I wonder why SUM aggregate is calculated for real (float4) type using
>> floating point accumulator?
> If you can't deal with the vagaries of floating-point arithmetic, you
> shouldn't be storing your data in float format. Use numeric.

4-byte floats are widely used for example in trading applications just
because it is two times shorter then double and range of stored data is
relatively small (do not need a lot of significant digits). At the same
time volume of stored data is very large and switching from float4 to
float8 will almost double it. It requires two times more storage and
almost two times increase query execution time.
So this is not acceptable answer.

>
>> Are there are reasons of using float4pl function for SUM aggregate instead of float4_accum?
> The latter is probably a good two orders of magnitude slower, and it
> wouldn't really do much to solve the inherent accuracy problems of
> adding float4 values that have a wide dynamic range.

It is not true - please notice query execution time of this two queries:

postgres=# select sum(l_quantity) from lineitem where l_shipdate <=
'1998-12-01';
sum
-------------
1.52688e+09
(1 row)

Time: 2858.852 ms
postgres=# select sum(l_quantity+0.0) from lineitem where l_shipdate <=
'1998-12-01';
sum
------------
1529738036
(1 row)

Time: 3174.529 ms

Looks like now in Postgres aggregate calculation itself is not a
bottleneck, comparing with tuple deform cost.

> The expectation for SUM(float4) is that you want speed and are
> prepared to cope with the consequences. It's easy enough to cast your
> input to float8 if you want a wider accumulator, or to numeric if
> you'd like more stable (not necessarily more accurate :-() results.
> I do not think it's the database's job to make those choices for you.

From my point of your it is strange and wrong expectation.
I am choosing "float4" type for a column just because it is enough to
represent range of data I have and I need to minimize size of record.
But when I am calculating sum, I expect to receive more or less precise
result. Certainly I realize that even in case of using double it is
possible to loose precision while calculation and result may depend on
sum order (if we add very small and very larger values). But in real use
cases (for example in trading data) such large difference in attribute
values is very rare. If you have, for example, stock price, then it is
very unlikely that one company has value 0.000001 and another 10000000.0
At least in TPC-H example (which certainly deal with dummy generated
data), double type produce "almost price" result.

>
> regards, tom lane

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-13 16:52:53 Re: amcheck (B-Tree integrity checking tool)
Previous Message Fujii Masao 2017-02-13 16:43:59 Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p