Re: Parallel grouping sets

From: Jesse Zhang <sbjesse(at)gmail(dot)com>
To: Richard Guo <riguo(at)pivotal(dot)io>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pengzhou Tang <ptang(at)pivotal(dot)io>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel grouping sets
Date: 2020-02-03 19:53:56
Message-ID: CAGf+fX4SDEqbPwT9_mfJLDXkBKQTT6O-wAHeKvUAViE+56n2Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 3, 2020 at 12:07 AM Richard Guo <riguo(at)pivotal(dot)io> wrote:
>
> Hi Jesse,
>
> Thanks for reviewing these two patches.
I enjoyed it!

>
> On Sat, Jan 25, 2020 at 6:52 AM Jesse Zhang <sbjesse(at)gmail(dot)com> wrote:
>>
>>
>> I glanced over both patches. Just the opposite, I have a hunch that v3
>> is always better than v5. Here's my 6-minute understanding of both.
>>
>> v3 ("the one with grouping set id") really turns the plan from a tree to
>> a multiplexed pipe: we can execute grouping aggregate on the workers,
>> but only partially. When we emit the trans values, also tag the tuple
>> with a group id. After gather, finalize the aggregates with a modified
>> grouping aggregate. Unlike a non-split grouping aggregate, the finalize
>> grouping aggregate does not "flow" the results from one rollup to the
>> next one. Instead, each group only advances on partial inputs tagged for
>> the group.
>>
>
> Yes, this is what v3 patch does.
>
> We (Pengzhou and I) had an offline discussion on this plan and we have
> some other idea. Since we have tagged 'GroupingSetId' for each tuple
> produced by partial aggregate, why not then perform a normal grouping
> sets aggregation in the final phase, with the 'GroupingSetId' included
> in the group keys? The plan looks like:
>
> # explain (costs off, verbose)
> select c1, c2, c3, avg(c3) from gstest group by grouping sets((c1,c2),(c1),(c2,c3));
> QUERY PLAN
> ------------------------------------------------------------------
> Finalize GroupAggregate
> Output: c1, c2, c3, avg(c3)
> Group Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
> -> Sort
> Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
> Sort Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
> -> Gather
> Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
> Workers Planned: 4
> -> Partial HashAggregate
> Output: c1, c2, c3, gset_id, PARTIAL avg(c3)
> Hash Key: gstest.c1, gstest.c2
> Hash Key: gstest.c1
> Hash Key: gstest.c2, gstest.c3
> -> Parallel Seq Scan on public.gstest
> Output: c1, c2, c3
>
> This plan should be able to give the correct results. We are still
> thinking if it is a better plan than the 'multiplexed pipe' plan as in
> v3. Inputs of thoughts here would be appreciated.

Ha, I believe you meant to say a "normal aggregate", because what's
performed above gather is no longer "grouping sets", right?

The group key idea is clever in that it helps "discriminate" tuples by
their grouping set id. I haven't completely thought this through, but my
hunch is that this leaves some money on the table, for example, won't it
also lead to more expensive (and unnecessary) sorting and hashing? The
groupings with a few partials are now sharing the same tuplesort with
the groupings with a lot of groups even though we only want to tell
grouping 1 *apart from* grouping 10, not neccessarily that grouping 1
needs to come before grouping 10. That's why I like the multiplexed pipe
/ "dispatched by grouping set id" idea: we only pay for sorting (or
hashing) within each grouping. That said, I'm open to the criticism that
keeping multiple tuplesort and agg hash tabes running is expensive in
itself, memory-wise ...

Cheers,
Jesse

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-02-03 20:56:11 Re: Global temporary tables
Previous Message Pavel Stehule 2020-02-03 19:46:00 Re: [Proposal] Global temporary tables