Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-26 23:40:59
Message-ID: CAEepm=0NpJ8j8edUReQsRmE0HqO5nyGYr+b2wPCPp+KN6Ks4kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 26, 2018 at 7:30 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> However, now I see that you and Thomas are trying to find a different
>> way to overcome this problem differently, so not sure if I should go
>> ahead or not. I have seen that you told you wanted to look at
>> Thomas's proposed stuff carefully tomorrow, so I will wait for you
>> guys to decide which way is appropriate.
>
> I suspect that the overhead of Thomas' experimental approach is going
> to causes problems in certain cases. Cases that are hard to foresee.
> That patch makes HandleParallelMessages() set ParallelMessagePending
> artificially, pending confirmation of having launched all workers.
>
> It was an interesting experiment, but I think that your
> WaitForParallelWorkersToAttach() idea has a better chance of working
> out.

Thanks for looking into this. Yeah. I think you're right that it
could add a bit of overhead in some cases (ie if you receive a lot of
signals that AREN'T caused by fork failure, then you'll enter
HandleParallelMessage() every time unnecessarily), and it does feel a
bit kludgy. The best idea I have to fix that so far is like this: (1)
add a member fork_failure_count to struct BackgroundWorkerArray, (2)
in do_start_bgworker() whenever fork fails, do
++BackgroundWorkerData->fork_failure_count (ie before a signal is sent
to the leader), (3) in procsignal_sigusr1_handler where we normally do
a bunch of CheckProcSignal(PROCSIG_XXX) stuff, if
(BackgroundWorkerData->fork_failure_count !=
last_observed_fork_failure_count) HandleParallelMessageInterrupt().
As far as I know, as long as fork_failure_count is (say) int32 (ie not
prone to tearing) then no locking is required due to the barriers
implicit in the syscalls involved there. This is still slightly more
pessimistic than it needs to be (the failed fork may be for someone
else's ParallelContext), but only in rare cases so it would be
practically as good as precise PROCSIG delivery. It's just that we
aren't allowed to deliver PROCSIGs from the postmaster. We are
allowed to communicate through BackgroundWorkerData, and there is a
precedent for cluster-visible event counters in there already.

I think you should proceed with Amit's plan. If we ever make a plan
like the above work in future, it'd render that redundant by turning
every CFI() into a cancellation point for fork failure, but I'm not
planning to investigate further given the muted response to my
scheming in this area so far.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-26 23:51:09 Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict
Previous Message Andres Freund 2018-01-26 23:37:46 Re: Setting BLCKSZ 4kB