Re: Parallel Aggregate

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-21 00:01:34
Message-ID: 36a4eec3-bd4e-5612-cc06-992714be3fe9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/21/2016 12:30 AM, David Rowley wrote:
> On 21 March 2016 at 09:47, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> ...
>> I'm not sure changing the meaning of enable_hashagg like this is a
>> good idea. It worked as a hard switch before, while with this
>> change that would not be the case. Or more accurately - it would
>> not be the case for aggregates, but it would still work the old way
>> for other types of plans. Not sure that's a particularly good
>> idea.
>
> Hmm, I don't see how it was a hard switch before. If we were unable
> to sort by the group by clause then hashagg would magically be
> enabled.

Sure, but that's not what I meant by the "hard switch" (sorry for the
inaccuracy). Let's assume we can actually so the sorted aggregate. In
that case the enable_hashagg=off entirely skipped the hashagg path,
irrespectively of the cost or whatever. With the new code we will add
the path, and it may actually win on the basis of cost (e.g. if there's
enable_sort=off at the same time).

But I'm not convinced this is actually wrong, I merely pointed out the
behavior is not exactly the same and may have unintended consequences.

> The reason I did this was to simplify the logic in
> create_grouping_paths(). What difference do you imagine that there
> actually is here?

That the hashed path may win over the sorted path, as explained above.

> The only thing I can think of is; we now generate a hashagg path
> where we previously didn't. This has disable_cost added to the
> startup_cost so is quite unlikely to win. Perhaps there is some
> differences if someone did SET enable_sort = false; SET
> enable_hashagg = false; I'm not sure if we should be worried there
> though. Also maybe there's going to be a difference if the plan
> costings were so high that disable_cost was drowned out by the other
> costings.

Ah, I see you came to the same conclusion ...

>
> Apart from that, It would actually be nice to be consistent with
> this enable_* GUCs, as to my knowledge the others all just add
> disable_cost to the startup_cost of the path.

Perhaps.

>
>> What about introducing a GUC to enable parallel aggregate, while
>> still allowing other types of parallel nodes (I guess that would be
>> handy for other types of parallel nodes - it's a bit blunt tool,
>> but tweaking max_parallel_degree is even blumter)? e.g.
>> enable_parallelagg?
>
> Haribabu this in his version of the patch and I didn't really
> understand the need for it, I assumed it was for testing only. We
> don't have enable_parallelseqscan, and would we plan on adding GUCs
> each time we enable a node for parallelism? I really don't think so,
> we already have parallel hash join and nested loop join without GUCs
> to disable them. I see no reason to add them there, and I also don't
> here.

I'm not so sure about that. I certainly think we'll be debugging queries
in a not so distant future, wishing for such GUCs.

>> I do also have a question about parallel aggregate vs. work_mem.
>> Nowadays we mostly say to users a query may allocate a multiple of
>> work_mem, up to one per node in the plan. Apparently with parallel
>> aggregate we'll have to say "multiplied by number of workers", because
>> each aggregate worker may allocate up to hashaggtablesize. Is that
>> reasonable? Shouldn't we restrict the total size of hash tables in
>> all workers somehow?
>
> I did think about this, but thought, either;
>
> 1) that a project wide decision should be made on how to handle this,
> not just for parallel aggregate, but parallel hash join too, which as
> I understand it, for now it builds an identical hashtable per worker.
>
> 2) the limit is per node, per connection, and parallel aggregates have
> multiple connections, so we might not be breaking our definition of
> how to define work_mem, since we're still limited by max_connections
> anyway.
>

I do agree this is not just a question for parallel aggregate, and that
perhaps we're not breaking the definition. But I think in practice we'll
be hitting the memory limits much more often, because parallel queries
are very synchronized (running the same node on all parallel workers at
exactly the same time).

Not sure if we have to reflect that in the planner, but it will probably
impact what work_mem values are considered safe.

>> create_grouping_paths also contains this comment:
>>
>> /*
>> * Generate HashAgg Path providing the estimated hash table size is not
>> * too big, although if no other Paths were generated above, then we'll
>> * begrudgingly generate one so that we actually have a Path to work
>> * with.
>> */
>>
>> I'm not sure this is a particularly clear comment, I think the old one was
>> way much informative despite being a single line:
>>
>> /* Consider reasons to disable hashing, but only if we can sort instead */
>
> hmm, I find it quite clear, but perhaps that's because I wrote the
> code. I'm not really sure what you're not finding clear about it to be
> honest. Tom's original comment was quite generic to allow for more
> reasons, but I removed one of those reasons by simplifying the logic
> around enable_hashagg, so I didn't think Tom's comment suited well
> anymore.
>
> I've rewritten the comment to become:
>
> /*
> * Providing that the estimated size of the hashtable does not exceed
> * work_mem, we'll generate a HashAgg Path, although if we were unable
> * to sort above, then we'd better generate a Path, so that we at least
> * have one.
> */
>
> How about that?

OK

>
>> BTW create_grouping_paths probably grew to a size when splitting it into
>> smaller pieces would be helpful?
>
> I'd rather not. Amit mentioned this too [1]. See 4A. Robert has marked
> it as ready for committer, so I really don't want to start hacking it
> up too much at this stage unless Robert requests so.

OK

>
> An updated patch is attached. This hopefully addresses your concerns
> with the comment, and also the estimate_hashagg_tablesize() NULL
> checking.
>
> [1] http://www.postgresql.org/message-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com

thanks

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2016-03-21 00:59:56 Re: pg_hba_lookup function to get all matching pg_hba.conf entries
Previous Message Robert Haas 2016-03-20 23:42:04 Re: Patch: fix lock contention for HASHHDR.mutex