Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-12-08 01:23:19
Message-ID: CAEepm=03YnefpCeB=Z67HtQAOEMuhKGyPCY_S1TeH=9a2Rr0LQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 8, 2017 at 1:57 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> 1. Thomas' barrier abstraction was added by commit 1145acc7. I think
> that you should use a static barrier in tuplesort.c now, and rip out
> the ConditionVariable fields in the Sharedsort struct. It's only a
> slightly higher level of abstraction for tuplesort.c, which makes only
> a small difference given the simple requirements of tuplesort.c.
> However, I see no reason to not go that way if that's the new
> standard, which it is. This looks like it will be fairly easy.

I thought about this too. A static barrier seems ideal for it, except
for one tiny detail. We'd initialise the barrier with the number of
participants, and then after launching we get to find out how many
workers were really launched using pcxt->nworkers_launched, which may
be a smaller number. If it's a smaller number, we need to adjust the
barrier to the smaller party size. We can't do that by calling
BarrierDetach() n times, because Andres convinced me to assert that
you didn't try to detach from a static barrier (entirely reasonably)
and I don't really want a process to be 'detaching' on behalf of
someone else anyway. So I think we'd need to add an extra barrier
function that lets you change the party size of a static barrier.
Yeah, that sounds like a contradiction... but it's not the same as
the attach/detach workflow because static parties *start out
attached*, which is a very important distinction (it means that client
code doesn't have to futz about with phases, or in other words the
worker doesn't have to consider the possibility that it started up
late and missed all the action and the sort is finished). The tidiest
way to provide this new API would, I think, be to change the internal
function BarrierDetachImpl() to take a parameter n and reduce
barrier->participants by that number, and then add a function
BarrierForgetParticipants(barrier, n) [insert better name] and have it
call BarrierDetachImpl(). Then the latter's assertion that
!static_party could move out to BarrierDetach() and
BarrierArriveAndDetach(). Alternatively, we could use the dynamic API
(see earlier parentheses about phases).

The end goal would be that code like this can use
BarrierInit(&barrier, participants), then (if necessary)
BarrierForgetParticipants(&barrier, nonstarters), and then they all
just have to call BarrierArriveAndWait() at the right time and that's
all. Nice and tidy.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-08 02:03:50 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Craig Ringer 2017-12-08 01:14:23 Re: Postgres with pthread