Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-23 01:22:50
Message-ID: CAH2-Wzn5HnufTJmanP-ML-xxeKEjPoBPnVd5noS+SxHe5jRHnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 18, 2018 at 9:22 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But, hang on a minute. Why do the workers need to wait for the leader
> anyway? Can't they just exit once they're done sorting? I think the
> original reason why this ended up in the patch is that we needed the
> leader to assume ownership of the tapes to avoid having the tapes get
> blown away when the worker exits. But, IIUC, with sharedfileset.c,
> that problem no longer exists. The tapes are jointly owned by all of
> the cooperating backends and the last one to detach from it will
> remove them. So, if the worker sorts, advertises that it's done in
> shared memory, and exits, then nothing should get removed and the
> leader can adopt the tapes whenever it gets around to it.

BTW, I want to point out that using the shared fileset infrastructure
is only a very small impediment to adding randomAccess support. If we
really wanted to support randomAccess for the leader's tapeset, while
recycling blocks from worker BufFiles, it looks like all we'd have to
do is change PathNameOpenTemporaryFile() to open files O_RDWR, rather
than O_RDONLY (shared fileset BufFiles that are opened after export
always have O_RDONLY segments -- we'd also have to change some
assertions, as well as some comments). Overall, this approach looks
straightforward, and isn't something that I can find an issue with
after an hour or so of manual testing.

Now, I'm not actually suggesting we go that way. As you know,
randomAccess isn't used by CREATE INDEX, and randomAccess may never be
needed for any parallel sort operation. More importantly, Thomas made
PathNameOpenTemporaryFile() use O_RDONLY for a reason, and I don't
want to trade one special case (randomAccess disallowed for parallel
tuplesort leader tapeset) in exchange for another one (the logtape.c
calls to BufFileOpenShared() ask for read-write BufFiles, not
read-only BufFiles).

I'm pointing this out because this is something that should increase
confidence in the changes I've proposed to logtape.c. The fact that
randomAccess support *would* be straightforward is a sign that I
haven't accidentally introduced some other assumption, or special
case.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-23 01:40:59 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Previous Message Ian Barwick 2018-01-23 00:28:45 Re: Add %r substitution for psql prompts to show recovery status