Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-26 06:00:23
Message-ID: CAA4eK1Kg5o4RoAyg9Qr90Owr6Aj_F=fxMTCLmyHDKukPYb1GSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Right, but what if the worker dies due to something proc_exit(1) or
>> something like that before calling BarrierArriveAndWait. I think this
>> is part of the problem we have solved in
>> WaitForParallelWorkersToFinish such that if the worker exits abruptly
>> at any point due to some reason, the system should not hang.
>
> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>
> 1. The problem of fork failure causing nbtsort.c to wait forever is a
> real problem. Sure enough, the coding pattern within
> _bt_leader_heapscan() can cause us to wait forever even with commit
> 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a
> consequence of the patch not using tuple queues (it uses the new
> tuplesort sharing thing instead).
>
> 2. Simply adding a single call to WaitForParallelWorkersToFinish()
> within _bt_leader_heapscan() before waiting on our condition variable
> fixes the problem -- errors are reliably propagated, and we never end
> up waiting forever.
>
> 3. This short term fix works just as well with
> parallel_leader_participation=off.
>
> At this point, my preferred solution is for someone to go implement
> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
> like the logical person for the job).
>

I can implement it and share a prototype patch with you which you can
use to test parallel sort stuff. I would like to highlight the
difference which you will see with WaitForParallelWorkersToAttach as
compare to WaitForParallelWorkersToFinish() is that the former will
give you how many of nworkers_launched workers are actually launched
whereas latter gives an error if any of the expected workers is not
launched. I feel former is good and your proposed way of calling it
after the leader is done with its work has alleviated the minor
disadvantage of this API which is that we need for workers to startup.

However, now I see that you and Thomas are trying to find a different
way to overcome this problem differently, so not sure if I should go
ahead or not. I have seen that you told you wanted to look at
Thomas's proposed stuff carefully tomorrow, so I will wait for you
guys to decide which way is appropriate.

> Once that's committed, I can
> post a new version of the patch that uses that new infrastructure --
> I'll add a call to the new function, without changing anything else.
> Failing that, we could actually just use
> WaitForParallelWorkersToFinish(). I still don't want to use a barrier,
> mostly because it complicates parallel_leader_participation=off,
> something that Amit is in agreement with [2][3].
>

I think if we want we can use barrier API's to solve this problem, but
I kind of have a feeling that it doesn't seem to be the most
appropriate API, especially because existing API like
WaitForParallelWorkersToFinish() can serve the need in a similar way.

Just to conclude, following are proposed ways to solve this problem:

1. Implement a new API WaitForParallelWorkersToAttach and use that to
solve this problem. Peter G. and Amit thinks, this is a good way to
solve this problem.
2. Use existing API WaitForParallelWorkersToFinish to solve this
problem. Peter G. feels that if API mentioned in #1 is not available,
we can use this to solve the problem and I agree with that position.
Thomas is not against it.
3. Use Thomas's new way to detect such failures. It is not clear to
me at this stage if any one of us have accepted it to be the way to
proceed, but Thomas and Peter G. want to investigate it further.
4. Use of Barrier API to solve this problem. Robert appears to be
strongly in favor of this approach.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-01-26 06:03:17 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Tom Lane 2018-01-26 05:43:47 Re: Redefining inet_net_ntop