Re: Parallel grouping sets

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Pengzhou Tang <ptang(at)pivotal(dot)io>
Cc: Jesse Zhang <sbjesse(at)gmail(dot)com>, Richard Guo <riguo(at)pivotal(dot)io>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel grouping sets
Date: 2020-02-24 10:27:07
Message-ID: CAMbWs4-fjjfnHNJA9YbeDcku5KAYN_LTVEPOaTrKLioLK23d1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

To summarize the current state of parallel grouping sets, we now have
two available implementations for it.

1) Each worker performs an aggregation step, producing a partial result
for each group of which that process is aware. Then the partial results
are gathered to the leader, which then performs a grouping sets
aggregation, as in patch [1].

This implementation is not very efficient sometimes, because the group
key for Partial Aggregate has to be all the columns involved in the
grouping sets.

2) Each worker performs a grouping sets aggregation on its partial
data, and tags 'GroupingSetId' for each tuple produced by partial
aggregate. Then the partial results are gathered to the leader, and the
leader performs a modified grouping aggregate, which dispatches the
partial results into different pipe according to 'GroupingSetId', as in
patch [2], or instead as another method, the leader performs a normal
aggregation, with 'GroupingSetId' included in the group keys, as
discussed in [3].

The second implementation would be generally better than the first one
in performance, and we have decided to concentrate on it.

[1]
https://www.postgresql.org/message-id/CAN_9JTx3NM12ZDzEYcOVLFiCBvwMHyM0gENvtTpKBoOOgcs=kw@mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAN_9JTwtTTnxhbr5AHuqVcriz3HxvPpx1JWE--DCSdJYuHrLtA@mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAN_9JTwtzttEmdXvMbJqXt=51kXiBTCKEPKq6kk2PZ6Xz6m5ig@mail.gmail.com

Thanks
Richard

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2020-02-24 11:21:13 Re: ALTER TABLE ... SET STORAGE does not propagate to indexes
Previous Message Amit Kapila 2020-02-24 10:08:47 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager