Re: parallel distinct union and aggregate support patch

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: bucoo(at)sohu(dot)com, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel distinct union and aggregate support patch
Date: 2020-11-29 06:23:18
Message-ID: CAFiTN-vaVErT4s_XC4ZeAbN9pP2g_BeKwo2uzD_sUohdU2dg=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 27, 2020 at 9:25 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I also had a quick look at the patch and the comments made so far. Summary:
>
> 1. The performance results are promising.
>
> 2. The code needs comments.
>
> Regarding the design:
>
> Thomas Munro mentioned the idea of a "Parallel Repartition" node that
> would redistribute tuples like this. As I understand it, the difference
> is that this BatchSort implementation collects all tuples in a tuplesort
> or a tuplestore, while a Parallel Repartition node would just
> redistribute the tuples to the workers, without buffering.

I think the advantage of the "Parallel BatchSort" is that it give
flexibility to pick the batches dynamically by the worker after the
repartition. OTOH if we distribute batches directly based on the
worker number the advantage is that the operator will be quite
flexible, e.g. if we want to implement the merge join we can just
place the "Parallel Repartition" node above both side of the scan node
and we will simply get the batch wise merge join because each worker
knows their batch. Whereas if we allow workers to dynamically pick
the batch the right side node needs to know which batch to pick
because it is dynamically picked, I mean it is not as difficult
because it is the same worker but it seems less flexible.

The receiving
> worker could put the tuples to a tuplestore or sort if needed.

If we are using it without buffering then the sending worker can
directly put the tuple into the respective sort/tuplestore node.

> I think a non-buffering Reparttion node would be simpler, and thus
> better. In these patches, you have a BatchSort node, and batchstore, but
> a simple Parallel Repartition node could do both. For example, to
> implement distinct:
>
> Gather
> - > Unique
> -> Sort
> -> Parallel Redistribute
> -> Parallel Seq Scan
>
> And a Hash Agg would look like this:
>
> Gather
> - > Hash Agg
> -> Parallel Redistribute
> -> Parallel Seq Scan
>
>
> I'm marking this as Waiting on Author in the commitfest.

I agree that the simple parallel redistribute/repartition node will be
flexible and could do both, but I see one problem. Basically, if we
use the common operator then first the Parallel Redistribute operator
will use the tuplestore for redistributing the data as per the worker
and then each worker might use the disk again to sort their respective
data. Instead of that while redistributing the data itself we can use
the parallel sort so that each worker gets their respective batch in
form of sorted tapes.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2020-11-29 06:47:30 Re: Add Information during standby recovery conflicts
Previous Message Michael Paquier 2020-11-29 06:16:13 Re: [PATCH] LWLock self-deadlock detection