Re: Parallel grouping sets

From: Richard Guo <riguo(at)pivotal(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel grouping sets
Date: 2019-07-31 08:06:30
Message-ID: CAN_9JTyPnkHuZ8ruWCwW-QiCefntONaE+E0w7H63hw2ywyh-4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 30, 2019 at 11:05 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Tue, Jul 30, 2019 at 03:50:32PM +0800, Richard Guo wrote:
> >On Wed, Jun 12, 2019 at 10:58 AM Richard Guo <riguo(at)pivotal(dot)io> wrote:
> >
> >> Hi all,
> >>
> >> Paul and I have been hacking recently to implement parallel grouping
> >> sets, and here we have two implementations.
> >>
> >> Implementation 1
> >> ================
> >>
> >> Attached is the patch and also there is a github branch [1] for this
> >> work.
> >>
> >
> >Rebased with the latest master.
> >
>
> Hi Richard,
>
> thanks for the rebased patch. I think the patch is mostly fine (at least I
> don't see any serious issues). A couple minor comments:
>

Hi Tomas,

Thank you for reviewing this patch.

>
> 1) I think get_number_of_groups() would deserve a short explanation why
> it's OK to handle (non-partial) grouping sets and regular GROUP BY in the
> same branch. Before these cases were clearly separated, now it seems a bit
> mixed up and it may not be immediately obvious why it's OK.
>

Added a short comment in get_number_of_groups() explaining the behavior
when doing partial aggregation for grouping sets.

>
> 2) There are new regression tests, but they are not added to any schedule
> (parallel or serial), and so are not executed as part of "make check". I
> suppose this is a mistake.
>

Yes, thanks. Added the new regression test in parallel_schedule and
serial_schedule.

>
> 3) The regression tests do check plan and results like this:
>
> EXPLAIN (COSTS OFF, VERBOSE) SELECT ...;
> SELECT ... ORDER BY 1, 2, 3;
>
> which however means that the query might easily use a different plan than
> what's verified in the eplain (thanks to the additional ORDER BY clause).
> So I think this should explain and execute the same query.
>
> (In this case the plans seems to be the same, but that may easily change
> in the future, and we could miss it here, failing to verify the results.)
>

Thank you for pointing this out. Fixed it in V4 patch.

>
> 4) It might be a good idea to check the negative case too, i.e. a query on
> data set that we should not parallelize (because the number of partial
> groups would be too high).
>

Yes, agree. Added a negative case.

>
>
> Do you have any plans to hack on the second approach too? AFAICS those two
> approaches are complementary (address different data sets / queries), and
> it would be nice to have both. One of the things I've been wondering is if
> we need to invent gset_id as a new concept, or if we could simply use the
> existing GROUPING() function - that uniquely identifies the grouping set.
>
>
Yes, I'm planning to hack on the second approach in short future. I'm
also reconsidering the gset_id stuff since it brings a lot of complexity
for the second approach. I agree with you that we can try GROUPING()
function to see if it can replace gset_id.

Thanks
Richard

Attachment Content-Type Size
v4-0001-Implementing-parallel-grouping-sets.patch application/octet-stream 24.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-07-31 08:17:09 Re: Problem with default partition pruning
Previous Message Amit Langote 2019-07-31 08:04:38 Re: partition routing layering in nodeModifyTable.c