Quick Links

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject:	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date:	2018-01-18 10:49:51
Message-ID:	CAEepm=31ewokTZTdReDCTGj_XXadpHmv6O_F7+w0b2+WV_CGVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I'm mostly away from my computer this week -- sorry about that, but
here are a couple of quick answers to questions directed at me:

On Thu, Jan 18, 2018 at 4:22 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Wed, Jan 17, 2018 at 10:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> While it certainly did occur to me that that was kind of weird, and I
>>> struggled with it on my own for a little while, I ultimately agreed
>>> with Thomas that it added something to have ltsConcatWorkerTapes()
>>> call some buffile function in every iteration of its loop.
>>> (BufFileView() + BufFileViewAppend() are code that Thomas actually
>>> wrote, though I added the asserts and comments myself.)
>>
>> Hmm, well, if Thomas contributed code to this patch, then he needs to
>> be listed as an author. I went searching for an email on this thread
>> (or any other) where he posted code for this, thinking that there
>> might be some discussion explaining the motivation, but I didn't find
>> any. I'm still in favor of erasing this distinction.
>
> I cleared this with Thomas recently, on this very thread, and got a +1
> from him on not listing him as an author. Still, I have no problem
> crediting Thomas as an author instead of a reviewer, even though
> you're now asking me to remove what little code he actually authored.
> The distinction between secondary author and reviewer is often
> blurred, anyway.

The confusion comes about because I gave some small code fragments to
Rushabh for the BufFileView stuff off-list, when suggesting ideas for
how to integrate Peter's patch with some ancestor of my SharedFileSet
patch. It was just a sketch and whether or not any traces remain in
the final commit, please credit me as a reviewer. I need to review
more patches! /me ducks

No objections from me if you hate the "view" idea or implementation
and think it's better to make a destructive append-BufFile-to-BufFile
operation instead.

On Thu, Jan 18, 2018 at 4:28 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Wed, Jan 17, 2018 at 6:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I had forgotten about the previous discussion. The sketch in my
>> previous email supposed that we would use dynamic barriers since the
>> whole point, after all, is to handle the fact that we don't know how
>> many participants will really show up. Thomas's idea seems to be that
>> the leader will initialize the barrier based on the anticipated number
>> of participants and then tell it to forget about the participants that
>> don't materialize. Of course, that would require that the leader
>> somehow figure out how many participants didn't show up so that it can
>> deduct then from the counter in the barrier. And how is it going to
>> do that?
>
> I don't know; Thomas?

The idea I mentioned would only work if nworkers_launched is never
over-reported in a scenario that doesn't error out or crash, and never
under-reported in any scenario. Otherwise static barriers may be even
less useful than I thought.

>> It's true that the leader will know the value of nworkers_launched,
>> but as the comment in LaunchParallelWorkers() says: "The caller must
>> be able to tolerate ending up with fewer workers than expected, so
>> there is no need to throw an error here if registration fails. It
>> wouldn't help much anyway, because registering the worker in no way
>> guarantees that it will start up and initialize successfully." So it
>> seems to me that a much better plan than having the leader try to
>> figure out how many workers failed to launch would be to just keep a
>> count of how many workers did in fact launch.

(If nworkers_launched can be silently over-reported, then does
parallel_leader_participation = off have a bug? If no workers really
launched and reached the main executor loop but nworkers_launched > 0,
then no one is running the plan.)

>> So my position (at least until Thomas or Andres shows up and tells me
>> why I'm wrong) is that you can use the Barrier API just as it is
>> without any yak-shaving, just by following the sketch I set out
>> before. The additional API I proposed in that sketch isn't really
>> required, although it might be more efficient. But it doesn't really
>> matter: if that comes along later, it will be trivial to adjust the
>> code to take advantage of it.

Yeah, the dynamic Barrier API was intended for things like this. I
was only trying to provide a simpler-to-use alternative that I thought
might work for this particular case (but not executor nodes, which
have another source of uncertainty about party size). It sounds like
it's not actually workable though, and the dynamic API may be the only
way. So the patch would have to deal with explicit phases.

> Okay. I'll work on adopting dynamic barriers in the way you described.
> I just wanted to make sure that we're all on the same page about what
> that looks like.

Looking at Robert's sketch, a few thoughts: (1) it's not OK to attach
and then just exit, you'll need to detach from the barrier both in the
case where the worker exits early because the phase is too high and
the case where you attach in in time to help and run to completion;
(2) maybe workers could use BarrierArriveAndDetach() at the end (the
leader needs to use BarrierArriveAndWait(), but the workers don't
really need to wait for each other before they exit, do they?); (3)
erm, maybe it's a problem that errors occurring in workers while the
leader is waiting at a barrier won't unblock the leader (we don't
detach from barriers on abort/exit) -- I'll look into this.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) at 2018-01-18 03:28:05 from Peter Geoghegan

Responses

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) at 2018-01-18 12:35:04 from Amit Kapila
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) at 2018-01-18 17:22:59 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2018-01-18 10:56:47	Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
Previous Message	Fabien COELHO	2018-01-18 10:26:45	Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)