Re: Parallel grouping sets

From: Jesse Zhang <sbjesse(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Richard Guo <riguo(at)pivotal(dot)io>, Michael Paquier <michael(at)paquier(dot)xyz>, Pengzhou Tang <ptang(at)pivotal(dot)io>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel grouping sets
Date: 2020-01-24 22:51:56
Message-ID: CAGf+fX5288z0Gt_UL9FYHvNcySvG1wrKxPQ7+zretbcaUDc5Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sun, Jan 19, 2020 at 2:23 PM Richard Guo <riguo(at)pivotal(dot)io> wrote:
> >
> > I realized that there are two patches in this thread that are
> > implemented according to different methods, which causes confusion.
> >
>
> Both the idea seems to be different. Is the second approach [1]
> inferior for any case as compared to the first approach? Can we keep
> both approaches for parallel grouping sets, if so how? If not, then
> won't the code by the first approach be useless once we commit second
> approach?
>
>
> [1] - https://www.postgresql.org/message-id/CAN_9JTwtTTnxhbr5AHuqVcriz3HxvPpx1JWE--DCSdJYuHrLtA%40mail.gmail.com
>

I glanced over both patches. Just the opposite, I have a hunch that v3
is always better than v5. Here's my 6-minute understanding of both.

v5 (the one with a simple partial aggregate) works by pushing a little
bit of partial aggregate onto workers, and perform grouping aggregate
above gather. This has two interesting outcomes: we can execute
unmodified partial aggregate on the workers, and execute almost
unmodified rollup aggreegate once the trans values are gathered. A
parallel plan for a query like

SELECT count(*) FROM foo GROUP BY GROUPING SETS (a), (b), (c), ();

can be

Finalize GroupAggregate
Output: count(*)
Group Key: a
Group Key: b
Group Key: c
Group Key: ()
Gather Merge
Partial GroupAggregate
Output: PARTIAL count(*)
Group Key: a, b, c
Sort
Sort Key: a, b, c
Parallel Seq Scan on foo

v3 ("the one with grouping set id") really turns the plan from a tree to
a multiplexed pipe: we can execute grouping aggregate on the workers,
but only partially. When we emit the trans values, also tag the tuple
with a group id. After gather, finalize the aggregates with a modified
grouping aggregate. Unlike a non-split grouping aggregate, the finalize
grouping aggregate does not "flow" the results from one rollup to the
next one. Instead, each group only advances on partial inputs tagged for
the group.

Finalize HashAggregate
Output: count(*)
Dispatched by: (GroupingSetID())
Group Key: a
Group Key: b
Group Key: c
Gather
Partial GroupAggregate
Output: PARTIAL count(*), GroupingSetID()
Group Key: a
Sort Key: b
Group Key: b
Sort Key: c
Group Key: c
Sort
Sort Key: a
Parallel Seq Scan on foo

Note that for the first approach to be viable, the partial aggregate
*has to* use a group key that's the union of all grouping sets. In cases
where individual columns have a low cardinality but joint cardinality is
high (say columns a, b, c each has 16 distinct values, but they are
independent, so there are 4096 distinct values on (a,b,c)), this results
in fairly high traffic through the shm tuple queue.

Cheers,
Jesse

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maciek Sakrejda 2020-01-25 00:26:57 Re: Duplicate Workers entries in some EXPLAIN plans
Previous Message Thomas Munro 2020-01-24 22:29:11 Re: [HACKERS] kqueue