Re: parallel distinct union and aggregate support patch

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: bucoo(at)sohu(dot)com
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: parallel distinct union and aggregate support patch
Date: 2020-11-27 15:55:25
Message-ID: 937fa586-1d97-c732-47b8-1697ac0f6360@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I also had a quick look at the patch and the comments made so far. Summary:

1. The performance results are promising.

2. The code needs comments.

Regarding the design:

Thomas Munro mentioned the idea of a "Parallel Repartition" node that
would redistribute tuples like this. As I understand it, the difference
is that this BatchSort implementation collects all tuples in a tuplesort
or a tuplestore, while a Parallel Repartition node would just
redistribute the tuples to the workers, without buffering. The receiving
worker could put the tuples to a tuplestore or sort if needed.

I think a non-buffering Reparttion node would be simpler, and thus
better. In these patches, you have a BatchSort node, and batchstore, but
a simple Parallel Repartition node could do both. For example, to
implement distinct:

Gather
- > Unique
-> Sort
-> Parallel Redistribute
-> Parallel Seq Scan

And a Hash Agg would look like this:

Gather
- > Hash Agg
-> Parallel Redistribute
-> Parallel Seq Scan

I'm marking this as Waiting on Author in the commitfest.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-11-27 16:15:27 Re: Online verification of checksums
Previous Message Tom Lane 2020-11-27 15:29:24 Re: configure and DocBook XML