Re: Parallel Aggregate

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parallel Aggregate
Date: 2016-03-20 20:47:14
Message-ID: e283ddb2-ac35-c169-4c04-c3949d5f46a9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/20/2016 09:58 AM, David Rowley wrote:
> On 20 March 2016 at 03:19, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
>> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>>> Updated patch is attached.
>>
>> I think this looks structurally correct now, and I think it's doing
>> the right thing as far as parallelism is concerned. I don't see any
>> obvious problems in the rest of it, either, but I haven't thought
>> hugely deeply about the way you are doing the costing, nor have I
>> totally convinced myself that all of the PathTarget and setrefs stuff
>> is correct. But I think it's probably pretty close. I'll study it
>> some more next week.
>
> Thank you for the reviews. The only thing I can think to mention which
> I've not already is that I designed estimate_hashagg_tablesize() to be
> reusable in various places in planner.c, yet I've only made use of it
> in create_grouping_paths(). I would imagine that it might be nice to
> also modify the other places which perform a similar calculation to
> use that function, but I don't think that needs to be done for this
> patch... perhaps a follow-on cleanup.

Hmmm, so how many places that could use the new function are there? I've
only found these two:

create_distinct_paths (planner.c)
choose_hashed_setop (prepunion.c)

That doesn't seem like an extremely high number, so perhaps doing it in
this patch would be fine. However if the function is defined as static,
choose_hashed_setop won't be able to use it anyway (well, it'll have to
move the prototype into a header and remove the static).

I wonder why we would need to allow cost_agg=NULL, though? I mean, if
there are no costing information, wouldn't it be better to either raise
an error, or at the very least do something like:

} else
hashentrysize += hash_agg_entry_size(0);

which is what create_distinct_paths does?

I'm not sure changing the meaning of enable_hashagg like this is a good
idea. It worked as a hard switch before, while with this change that
would not be the case. Or more accurately - it would not be the case for
aggregates, but it would still work the old way for other types of
plans. Not sure that's a particularly good idea.

What about introducing a GUC to enable parallel aggregate, while still
allowing other types of parallel nodes (I guess that would be handy for
other types of parallel nodes - it's a bit blunt tool, but tweaking
max_parallel_degree is even blumter)? e.g. enable_parallelagg?

I do also have a question about parallel aggregate vs. work_mem.
Nowadays we mostly say to users a query may allocate a multiple of
work_mem, up to one per node in the plan. Apparently with parallel
aggregate we'll have to say "multiplied by number of workers", because
each aggregate worker may allocate up to hashaggtablesize. Is that
reasonable? Shouldn't we restrict the total size of hash tables in all
workers somehow?

create_grouping_paths also contains this comment:

/*
* Generate HashAgg Path providing the estimated hash table size is not
* too big, although if no other Paths were generated above, then we'll
* begrudgingly generate one so that we actually have a Path to work
* with.
*/

I'm not sure this is a particularly clear comment, I think the old one
was way much informative despite being a single line:

/* Consider reasons to disable hashing, but only if we can sort instead */

BTW create_grouping_paths probably grew to a size when splitting it into
smaller pieces would be helpful?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-03-20 21:27:44 Re: [patch] Proposal for \crosstabview in psql
Previous Message Thom Brown 2016-03-20 19:55:26 Breakage with VACUUM ANALYSE + partitions