Re: Parallel Aggregate

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-01-23 01:59:33
Message-ID: CAJrrPGdqCb06j2qNag2aQUyMN4aEDL7pqeDkYLfHgY8YN+rBzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 22, 2016 at 10:13 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 22 January 2016 at 17:25, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>> Along with these changes, I added a float8 combine function to see
>> how it works under parallel aggregate, it is working fine for float4, but
>> giving small data mismatch with float8 data type.
>>
>> postgres=# select avg(f3), avg(f4) from tbl;
>> avg | avg
>> ------------------+------------------
>> 1.10000002384186 | 100.123449999879
>> (1 row)
>>
>> postgres=# set enable_parallelagg = true;
>> SET
>> postgres=# select avg(f3), avg(f4) from tbl;
>> avg | avg
>> ------------------+------------------
>> 1.10000002384186 | 100.123449999918
>> (1 row)
>>
>> Column - f3 - float4
>> Column - f4 - float8
>>
>> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
>> aggregates. Any special care is needed for float8 datatype?
>
> I'm not sure if this is what's going on here, as I don't really know
> the range of numbers that you've used to populate f4 with. It would be
> good to know, does "f4" contain negative values too?

No negative values are present in the f4 column.
Following are the SQL statements,

create table tbl(f1 int, f2 char(100), f3 float4, f4 float8);
insert into tbl values(generate_series(1,100000), 'Fujitsu', 1.1, 100.12345);

> It's not all that hard to demonstrate the instability of addition with
> float8. Take the following example:
>
> create table d (d float8);
> insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231);
>
> # select sum(d order by random()) from d;
> sum
> -----
> 0
> (1 row)
>
> same query, once more.
>
> # select sum(d order by random()) from d;
> sum
> ----------
> 2.3e-013
> (1 row)
>
> Here the result just depends on the order which the numbers have been
> added. You may need to execute a few more times to see the result
> change.
>
> Perhaps a good test would be to perform a sum(f4 order by random()) in
> serial mode, and see if you're getting a stable result from the
> numbers that you have populated the table with.
>
> If that's the only problem at play here, then I for one am not worried
> about it, as the instability already exists today depending on which
> path is chosen to scan the relation. For example an index scan is
> likely not to return rows in the same order as a seq scan.
>
> We do also warn about this in the manual: "Inexact means that some
> values cannot be converted exactly to the internal format and are
> stored as approximations, so that storing and retrieving a value might
> show slight discrepancies. Managing these errors and how they
> propagate through calculations is the subject of an entire branch of
> mathematics and computer science and will not be discussed here,
> except for the following points:" [1]
>
>
> [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html
>

Thanks for the detailed explanation. Now I understood.

Here I attached updated patch with additional combine function for
two stage aggregates also.

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
additional_combine_fns_v1.patch application/octet-stream 22.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-01-23 02:39:50 Re: silent data loss with ext4 / all current versions
Previous Message Tom Lane 2016-01-23 01:40:23 Re: Proposal: Trigonometric functions in degrees