Re: parallel distinct union and aggregate support patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: bucoo(at)sohu(dot)com, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel distinct union and aggregate support patch
Date: 2020-11-28 23:16:42
Message-ID: CA+Tgmob2LY=Q9jj99s3xPs7zd5dq3M=mObUGGgQWK3CAmC7trA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 27, 2020 at 10:55 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> I think a non-buffering Reparttion node would be simpler, and thus
> better. In these patches, you have a BatchSort node, and batchstore, but
> a simple Parallel Repartition node could do both. For example, to
> implement distinct:
>
> Gather
> - > Unique
> -> Sort
> -> Parallel Redistribute
> -> Parallel Seq Scan
>
> And a Hash Agg would look like this:
>
> Gather
> - > Hash Agg
> -> Parallel Redistribute
> -> Parallel Seq Scan
>
> I'm marking this as Waiting on Author in the commitfest.

I'm also intrigued by the parallel redistribute operator -- it seems
like it might be more flexible than this approach. However, I'm
concerned that there may be deadlock risks. If there is no buffer, or
a fixed-size buffer, the buffer might be full, and process trying to
jam tuples into the parallel redistribute would have to wait. Now if A
can wait for B and at the same time B can wait for A, deadlock will
ensue. In a naive implementation, this could happen with a single
parallel redistribute operator: worker 1 is trying to send a tuple to
worker 2, which can't receive it because it's busy sending a tuple to
worker 1. That could probably be fixed by arranging for workers to try
to try to receive data whenever they block in the middle of sending
data. However, in general there can be multiple nodes that cause
waiting in the tree: any number of Parallel Redistribute nodes, plus a
Gather, plus maybe other stuff. The cheap way out of that problem is
to use a buffer that can grow arbitrarily large, but that's not
terribly satisfying either.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-11-28 23:49:45 Re: proposal: possibility to read dumped table's name from file
Previous Message Eugen Konkov 2020-11-28 22:57:58 Feature Request: Report additionally error value