Re: BUG #15869: Custom aggregation returns null when parallelized

From: Kassym Dorsel <k(dot)dorsel(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15869: Custom aggregation returns null when parallelized
Date: 2019-06-24 16:07:20
Message-ID: CAKTpVaayY-rN3s1uFzgOvnmXUtS1ovm=Nsk0Br_Toq+NLBfNTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Right, adding the Gather node makes it use the combine func and this is
where the problem is.

You're right on handling of null values in my combine function. Since this
was being run on a table with 150k rows, I had assumed that the contents of
my aggregate types would never be null/empty.

Thinking about it, it would make sense to receive an aggregate type with
count = 0 or null iff there is 1 worker (1 result to combine the other
being null/empty). When there are 2 or more workers I would assume that
rows would be relatively evenly split and the return of my aggregate type
would be filled given the 150k rows. I tried with 1,2,3,4 workers (ALTER
TABLE temp SET (parallel_workers = 1,2,3,4);) and got the same null results
before adding support for null values.

Is this expected behavior when number of workers is >=2? An explicit
paragraph in parallel aggregates documentation outlining null support in
combine func might be helpful.

Regardless, adding support for null/empty values has fixed my problem and
now the aggregate correctly works in parallel queries. Many thanks.

Best,
Kassym

On Sun, Jun 23, 2019 at 10:51 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On Mon, 24 Jun 2019 at 03:32, PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> > Here's the setup code:
> > --CREATE TABLE temp (val double precision);
> > --insert into temp (val) select i from generate_series(0, 150000) as
> t(i);
> > --set force_parallel_mode = on;
> > select (stats_agg(val)).* from temp;
>
> I don't think force_parallel_mode does what you think it does. It just
> adds a Gather node to the top of the plan, if the plan is deemed
> parallel safe. It's not going to force your aggregate to be
> parallelised.
>
> You might coax the planner into generating a parallel aggregate plan
> by setting parallel_tuple_cost and parallel_setup_cost both to 0.
>
> > Expected results:
> > 150001, 37500, 75000, 112500
> >
> > Results when run in parallel:
> > 150001, null, null, null
>
> Are you actually getting a partial and finalize aggregate node with
> that? Can you show the EXPLAIN output of each?
>
> You might also want to double check your combine function. It does not
> look like it's very well coded to handle NULL values for arrays that
> have yet to receive their fill of 5 elements.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-06-24 19:05:30 Re: BUG #15865: ALTER TABLE statements causing "relation already exists" errors when some indexes exist
Previous Message Sergei Kornilov 2019-06-24 11:59:37 Re: BUG #15870: You can't see the table after executing the table-building statement with \i