From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|

To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |

Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |

Subject: | Re: Sum aggregate calculation for single precsion real |

Date: | 2017-02-13 16:45:45 |

Message-ID: | fd9e577d-6f58-134b-79aa-61d96ee52183@postgrespro.ru |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

On 13.02.2017 19:20, Tom Lane wrote:

> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:

>> I wonder why SUM aggregate is calculated for real (float4) type using

>> floating point accumulator?

> If you can't deal with the vagaries of floating-point arithmetic, you

> shouldn't be storing your data in float format. Use numeric.

4-byte floats are widely used for example in trading applications just

because it is two times shorter then double and range of stored data is

relatively small (do not need a lot of significant digits). At the same

time volume of stored data is very large and switching from float4 to

float8 will almost double it. It requires two times more storage and

almost two times increase query execution time.

So this is not acceptable answer.

>

>> Are there are reasons of using float4pl function for SUM aggregate instead of float4_accum?

> The latter is probably a good two orders of magnitude slower, and it

> wouldn't really do much to solve the inherent accuracy problems of

> adding float4 values that have a wide dynamic range.

It is not true - please notice query execution time of this two queries:

postgres=# select sum(l_quantity) from lineitem where l_shipdate <=

'1998-12-01';

sum

-------------

1.52688e+09

(1 row)

Time: 2858.852 ms

postgres=# select sum(l_quantity+0.0) from lineitem where l_shipdate <=

'1998-12-01';

sum

------------

1529738036

(1 row)

Time: 3174.529 ms

Looks like now in Postgres aggregate calculation itself is not a

bottleneck, comparing with tuple deform cost.

> The expectation for SUM(float4) is that you want speed and are

> prepared to cope with the consequences. It's easy enough to cast your

> input to float8 if you want a wider accumulator, or to numeric if

> you'd like more stable (not necessarily more accurate :-() results.

> I do not think it's the database's job to make those choices for you.

From my point of your it is strange and wrong expectation.

I am choosing "float4" type for a column just because it is enough to

represent range of data I have and I need to minimize size of record.

But when I am calculating sum, I expect to receive more or less precise

result. Certainly I realize that even in case of using double it is

possible to loose precision while calculation and result may depend on

sum order (if we add very small and very larger values). But in real use

cases (for example in trading data) such large difference in attribute

values is very rare. If you have, for example, stock price, then it is

very unlikely that one company has value 0.000001 and another 10000000.0

At least in TPC-H example (which certainly deal with dummy generated

data), double type produce "almost price" result.

>

> regards, tom lane

--

Konstantin Knizhnik

Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

- Re: Sum aggregate calculation for single precsion real at 2017-02-13 16:20:49 from Tom Lane

- Re: Sum aggregate calculation for single precsion real at 2017-02-14 13:59:00 from Jim Nasby

From | Date | Subject | |
---|---|---|---|

Next Message | Robert Haas | 2017-02-13 16:52:53 | Re: amcheck (B-Tree integrity checking tool) |

Previous Message | Fujii Masao | 2017-02-13 16:43:59 | Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p |