Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-12-09 21:36:08
Message-ID: CAH2-WzkWMr6m6K5bW679tDoacAgP+FQaMs5kxZjbGPA=KRostA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 7, 2017 at 11:28 PM, Rushabh Lathia
<rushabh(dot)lathia(at)gmail(dot)com> wrote:
> I thought parallel_leader_participation is generic GUC which get effect
> for all parallel operation. isn't it? On that understanding I just update
> the
> documentation of parallel_leader_participation into config.sgml to
> make it more generalize.

Okay. I'm not quite sure how to fit parallel_leader_participation into
parallel CREATE INDEX (see my remarks on that below).

I see a new bug in the patch (my own bug). Which is: the CONCURRENTLY
case should obtain a RowExclusiveLock on the index relation within
_bt_worker_main(), not an AccessExclusiveLock. That's all the leader
has at that point within CREATE INDEX CONCURRENTLY.

I now believe that index_create() should reject catalog parallel
CREATE INDEX directly, just as it does for catalog CREATE INDEX
CONCURRENTLY. That logic should be generic to all AMs, since the
reasons for disallowing catalog parallel index builds are generic.

On a similar note, *maybe* we should even call
plan_create_index_workers() from index_create() (or at least some
point within index.c). You're going to need a new field or two within
IndexInfo for this, beside ii_Concurrent/ii_BrokenHotChain (next to
the other stuff that is only used during index builds). Maybe
ii_ParallelWorkers, and ii_LeaderAsWorker. What do you think of this
suggestion? It's probably neater overall...though I'm less confident
that this one is an improvement.

Note that cluster.c calls plan_cluster_use_sort() directly, while
checking "OldIndex->rd_rel->relam == BTREE_AM_OID" as a prerequisite
to calling it. This seems like it might be considered an example that
we should follow within index.c -- plan_create_index_workers() is
based on plan_cluster_use_sort().

> Yes, to me also it's looks kind of impossible situation but then too
> it make sense to make one local variable and then always read the
> value from that.

I think that it probably is technically possible, though the user
would have to be doing something insane for it to be a problem. As I'm
sure you understand, it's simpler to eliminate the possibility than it
is to reason about it never happening.

>> 1. Thomas' barrier abstraction was added by commit 1145acc7. I think
>> that you should use a static barrier in tuplesort.c now, and rip out
>> the ConditionVariable fields in the Sharedsort struct.

> Pending, as per Thomas' explanation, it seems like need some more
> work in the barrier APIs.

Okay. It's not the case that parallel tuplesort would significantly
benefit from using the barrier abstraction, so I don't think we need
to consider this a blocker to commit. My concern is mostly just that
everyone is on the same page with barriers.

> Ah nice catch. I passed the local variable (leaderasworker) of
> _bt_heapscan()
> to plan_create_index_workers() rather than direct reading value from the
> parallel_leader_participation (reasons are same as you explained earlier).

Cool. I don't think that this should be a separate patch -- please
rebase + squash.

Do you think that the main part of the cost model needs to care about
parallel_leader_participation, too?

compute_parallel_worker() assumes that the caller is planning a
parallel-sequential-scan-alike thing, in the sense that the leader
only acts like a worker in cases that probably don't have many
workers, where the leader cannot keep itself busy as a leader. That's
actually quite different to parallel CREATE INDEX, because the
leader-as-worker state will behave in exactly the same way as a worker
would, no matter how many workers there are. The leader process is
guaranteed to give its full attention to being a worker, because it
has precisely nothing else to do until workers finish. This makes me
think that we may need to immediately do something with the result of
compute_parallel_worker(), to consider whether or not a
leader-as-worker state should be used, despite the fact that no
existing compute_parallel_worker() caller does anything like this.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-12-09 21:40:46 Re: proposal: alternative psql commands quit and exit
Previous Message Noah Misch 2017-12-09 21:02:03 SIGPIPE in TAP tests