Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-11 22:26:40
Message-ID: CA+TgmoZuo4tVXouNTJfTHbZ=+am1GNZ=u-a1ajyw_uzDd+PW_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 10, 2018 at 1:45 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> There's a lot here I haven't grokked yet, but I'm running out of
> mental energy so I think I'll send this for now and work on this some
> more when time permits, hopefully tomorrow.

Looking at the logtape changes:

While the patch contains, as I said before, an excellent set of how-to
directions explaining how to use the new parallel sort facilities in
tuplesort.c, there seems to be no such thing for logtape.c, and as a
result I find it a bit unclear how the interface is supposed to work.
I think it would be good to add a similar summary here.

It seems like the words "leader" and "worker" here refer to the leader
of a parallel operation and the associated workers, but do we really
need to make that assumption? Couldn't we generally describe this as
merging a bunch of 1-tape LogicalTapeSets created from a SharedFileSet
into a single LogicalTapeSet that can thereafter be read by the
process that does the merging?

+ /* Pass worker BufFile pieces, and a placeholder leader piece */
+ for (i = 0; i < lts->nTapes; i++)
+ {
+ lt = &lts->tapes[i];
+
+ /*
+ * Build concatenated view of all BufFiles, remembering the block
+ * number where each source file begins.
+ */
+ if (i < lts->nTapes - 1)

Unless I'm missing something, the "if" condition just causes the last
pass through this loop to do nothing. If so, why not just change the
loop condition to i < lts->nTapes - 1 and drop the "if" statement
altogether?

+ char filename[MAXPGPATH] = {0};

I don't think you need = {0}, because pg_itoa is about to clobber it anyway.

+ /* Alter worker's tape state (generic values okay for leader) */

What do you mean by generic values?

+ * Each tape is initialized in write state. Serial callers pass ntapes, but
+ * NULL arguments for everything else. Parallel worker callers pass a
+ * shared handle and worker number, but tapeset should be NULL. Leader
+ * passes worker -1, a shared handle, and shared tape metadata. These are
+ * used to claim ownership of worker tapes.

This comment doesn't match the actual function definition terribly
well. Serial callers don't pass NULL for "everything else", because
"int worker" is not going to be NULL. For parallel workers, it's not
entirely obvious whether "a shared handle" means TapeShare *tapes or
SharedFileSet *fileset. "tapeset" sounds like an argument name, but
there is no such argument.

lt->max_size looks like it might be an optimization separate from the
overall patch, but maybe I'm wrong about that.

+ /* palloc() larger than MaxAllocSize would fail */
lt->buffer = NULL;
lt->buffer_size = 0;
+ lt->max_size = MaxAllocSize;

The comment about palloc() should move down to where you assign max_size.

Generally we avoid returning a struct type, so maybe
LogicalTapeFreeze() should instead grow an out parameter of type
TapeShare * which it populates only if not NULL.

Won't LogicalTapeFreeze() fail an assertion in BufFileExportShared()
if the file doesn't belong to a shared fileset? If you adopt the
previous suggestion, we can probably just make whether to call this
contingent on whether the TapeShare * out parameter is provided.

I'm not confident I completely understand what's going on with the
logtape stuff yet, so I might have more comments (or better ones)
after I study this further. To your question about whether to go
ahead and post a new version, I'm OK to keep reviewing this version
for a little longer or to switch to a new one, as you prefer. I have
not made any local changes, just written a blizzard of email text.
:-p

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2018-01-11 22:53:24 Re: Parameters in user-defined aggregate final functions
Previous Message Peter Geoghegan 2018-01-11 22:18:30 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)