Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-12-12 10:09:29
Message-ID: CAGPqQf2g+OORAnnPyPtr5Oh9n=Dp4BCqkF0urrkw9yivKSmpeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 10, 2017 at 3:06 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Dec 7, 2017 at 11:28 PM, Rushabh Lathia
> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> > I thought parallel_leader_participation is generic GUC which get effect
> > for all parallel operation. isn't it? On that understanding I just
> update
> > the
> > documentation of parallel_leader_participation into config.sgml to
> > make it more generalize.
>
> Okay. I'm not quite sure how to fit parallel_leader_participation into
> parallel CREATE INDEX (see my remarks on that below).
>
> I see a new bug in the patch (my own bug). Which is: the CONCURRENTLY
> case should obtain a RowExclusiveLock on the index relation within
> _bt_worker_main(), not an AccessExclusiveLock. That's all the leader
> has at that point within CREATE INDEX CONCURRENTLY.
>
>
Oh right. I also missed to test that earlier. Fixed now.

> I now believe that index_create() should reject catalog parallel
> CREATE INDEX directly, just as it does for catalog CREATE INDEX
> CONCURRENTLY. That logic should be generic to all AMs, since the
> reasons for disallowing catalog parallel index builds are generic.
>
>
Sorry I didn't get this, reject means? you mean it should throw an error
catalog parallel CREATE INDEX? or just suggesting to set the
ParallelWorkers and may be LeaderAsWorker from index_create()
or may be index_build()?

> On a similar note, *maybe* we should even call
> plan_create_index_workers() from index_create() (or at least some
> point within index.c). You're going to need a new field or two within
> IndexInfo for this, beside ii_Concurrent/ii_BrokenHotChain (next to
> the other stuff that is only used during index builds). Maybe
> ii_ParallelWorkers, and ii_LeaderAsWorker. What do you think of this
> suggestion? It's probably neater overall...though I'm less confident
> that this one is an improvement.
>
> Note that cluster.c calls plan_cluster_use_sort() directly, while
> checking "OldIndex->rd_rel->relam == BTREE_AM_OID" as a prerequisite
> to calling it. This seems like it might be considered an example that
> we should follow within index.c -- plan_create_index_workers() is
> based on plan_cluster_use_sort().
>
> > Yes, to me also it's looks kind of impossible situation but then too
> > it make sense to make one local variable and then always read the
> > value from that.
>
> I think that it probably is technically possible, though the user
> would have to be doing something insane for it to be a problem. As I'm
> sure you understand, it's simpler to eliminate the possibility than it
> is to reason about it never happening.
>
>
yes.

> >> 1. Thomas' barrier abstraction was added by commit 1145acc7. I think
> >> that you should use a static barrier in tuplesort.c now, and rip out
> >> the ConditionVariable fields in the Sharedsort struct.
>
> > Pending, as per Thomas' explanation, it seems like need some more
> > work in the barrier APIs.
>
> Okay. It's not the case that parallel tuplesort would significantly
> benefit from using the barrier abstraction, so I don't think we need
> to consider this a blocker to commit. My concern is mostly just that
> everyone is on the same page with barriers.
>
>
True, if needed, this can be also done later on.

> > Ah nice catch. I passed the local variable (leaderasworker) of
> > _bt_heapscan()
> > to plan_create_index_workers() rather than direct reading value from the
> > parallel_leader_participation (reasons are same as you explained
> earlier).
>
> Cool. I don't think that this should be a separate patch -- please
> rebase + squash.
>
>
Sure, done.

> Do you think that the main part of the cost model needs to care about
> parallel_leader_participation, too?
>
> compute_parallel_worker() assumes that the caller is planning a
> parallel-sequential-scan-alike thing, in the sense that the leader
> only acts like a worker in cases that probably don't have many
> workers, where the leader cannot keep itself busy as a leader. That's
> actually quite different to parallel CREATE INDEX, because the
> leader-as-worker state will behave in exactly the same way as a worker
> would, no matter how many workers there are. The leader process is
> guaranteed to give its full attention to being a worker, because it
> has precisely nothing else to do until workers finish. This makes me
> think that we may need to immediately do something with the result of
> compute_parallel_worker(), to consider whether or not a
> leader-as-worker state should be used, despite the fact that no
> existing compute_parallel_worker() caller does anything like this.
>
>
I agree with you. compute_parallel_worker() mainly design for the
scan-alike things. Where as parallel create index is different in a
sense where leader has as much power as worker. But at the same
time I don't see any side effect or negative of that with PARALLEL
CREATE INDEX. So I am more towards not changing that aleast
for now - as part of this patch.

Thanks for review.

Regards,
Rushabh Lathia
www.EnterpriseDB.com

Attachment Content-Type Size
0003-Adjust-planner-code-to-consider-parallel_leader_part.patch text/x-patch 4.6 KB
0002-Obtain-a-RowExclusiveLock-on-the-index-relation-with.patch text/x-patch 1.1 KB
0001-Add-parallel-tuple-sort-for-CREATE-INDEX.patch text/x-patch 150.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-12-12 10:13:11 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Thomas Munro 2017-12-12 09:46:11 Re: [HACKERS] Parallel Hash take II