Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-01-22 11:13:38
Message-ID: CAKJS1f_HpLFhkd2yLfrsrmUMBZWQkGvJCWX21B_xg1A-0pzwHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 January 2016 at 17:25, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> Along with these changes, I added a float8 combine function to see
> how it works under parallel aggregate, it is working fine for float4, but
> giving small data mismatch with float8 data type.
>
> postgres=# select avg(f3), avg(f4) from tbl;
> avg | avg
> ------------------+------------------
> 1.10000002384186 | 100.123449999879
> (1 row)
>
> postgres=# set enable_parallelagg = true;
> SET
> postgres=# select avg(f3), avg(f4) from tbl;
> avg | avg
> ------------------+------------------
> 1.10000002384186 | 100.123449999918
> (1 row)
>
> Column - f3 - float4
> Column - f4 - float8
>
> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
> aggregates. Any special care is needed for float8 datatype?

I'm not sure if this is what's going on here, as I don't really know
the range of numbers that you've used to populate f4 with. It would be
good to know, does "f4" contain negative values too?

It's not all that hard to demonstrate the instability of addition with
float8. Take the following example:

create table d (d float8);
insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231);

# select sum(d order by random()) from d;
sum
-----
0
(1 row)

same query, once more.

# select sum(d order by random()) from d;
sum
----------
2.3e-013
(1 row)

Here the result just depends on the order which the numbers have been
added. You may need to execute a few more times to see the result
change.

Perhaps a good test would be to perform a sum(f4 order by random()) in
serial mode, and see if you're getting a stable result from the
numbers that you have populated the table with.

If that's the only problem at play here, then I for one am not worried
about it, as the instability already exists today depending on which
path is chosen to scan the relation. For example an index scan is
likely not to return rows in the same order as a seq scan.

We do also warn about this in the manual: "Inexact means that some
values cannot be converted exactly to the internal format and are
stored as approximations, so that storing and retrieving a value might
show slight discrepancies. Managing these errors and how they
propagate through calculations is the subject of an entire branch of
mathematics and computer science and will not be discussed here,
except for the following points:" [1]

[1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-01-22 12:32:29 Re: silent data loss with ext4 / all current versions
Previous Message Aleksander Alekseev 2016-01-22 10:48:37 Re: Patch: fix lock contention for HASHHDR.mutex