Re: BUG #15869: Custom aggregation returns null when parallelized

From: Kassym Dorsel <k(dot)dorsel(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15869: Custom aggregation returns null when parallelized
Date: 2019-06-25 15:07:13
Message-ID: CAKTpVaYSK=2tav0+K3uZkGkLYokGW1Gwi7F=6wdEN4Mdytpo8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Yes, the issue comes from having the array in my state have zero elements.
However why does an aggregate worker end up with a state in which zero
records have been passed to it?

In the case of 1 worker process, this makes sense to me. The combine func
takes in two states, the one filled by the single worker process and
another dummy empty state. In the case of 2 or more worker process I would
think that all states would process some rows and not be left initialized
and never used. Is the fact that when there are 2 or more workers that one
or more aggregation states doesn't receive any data to process a bug? Or is
this non deterministic behavior to be expected?

My mistake was making the assumption that an initialized state would always
receive data and would not be passed to a combine func without having
processed any data.

This is the part that would be nice to have in the documentation. In an
aggregation with a gather node an initialized worker state may never be
passed any data to process and would thus keep its initialized state when
being passed to the combine func.

Best,
Kassym

On Mon, Jun 24, 2019 at 6:59 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On Tue, 25 Jun 2019 at 04:07, Kassym Dorsel <k(dot)dorsel(at)gmail(dot)com> wrote:
> > Right, adding the Gather node makes it use the combine func and this is
> where the problem is.
>
> You're mixing up Gather and Parallel Aggregates. Setting
> force_parallel_mode to on does not force the aggregate to be
> parallelised. It just tries to inject a Gather node at the top of the
> plan. I think it was really meant just to test the tuple queues for
> parallel query back in 9.6. You're certainly not the only person to
> have been confused by it.
>
> > You're right on handling of null values in my combine function. Since
> this was being run on a table with 150k rows, I had assumed that the
> contents of my aggregate types would never be null/empty.
> >
> > Thinking about it, it would make sense to receive an aggregate type with
> count = 0 or null iff there is 1 worker (1 result to combine the other
> being null/empty). When there are 2 or more workers I would assume that
> rows would be relatively evenly split and the return of my aggregate type
> would be filled given the 150k rows. I tried with 1,2,3,4 workers (ALTER
> TABLE temp SET (parallel_workers = 1,2,3,4);) and got the same null results
> before adding support for null values.
> >
> > Is this expected behavior when number of workers is >=2? An explicit
> paragraph in parallel aggregates documentation outlining null support in
> combine func might be helpful.
>
> I don't think anyone would be opposed to improving the documents, but
> in this case, it's not the state that was NULL. You don't need to deal
> with that since you made your combine function strict. It was your
> array elements that were NULL and "<value> <op> NULL" yielding NULL is
> fairly fundamental to SQL, not really specific to aggregation. Your
> initcond made the q[] array an empty array, so trying to fetch an
> element that does not exist will yield NULL. You wouldn't have had the
> issue if you'd set all those array elements to 0 in the initcond, but
> I've not taken the time to understand your transfn to know if that's
> valid. If you've added NULL handling in the combinefn, then that's
> likely fine.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-06-25 21:30:44 BUG #15873: Attaching a partition fails because it sees deleted columns
Previous Message Tom Lane 2019-06-25 15:02:09 Weird index ordering in psql's \d (was Re: BUG #15865: ALTER TABLE statements causing "relation already exists" errors when some indexes exist)