Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-05 22:17:51
Message-ID: CAH2-Wzk66gCxRjs8tQWQwBc8kg1VG9+KNitutjg3K=jsYL+euA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Tue, Jan 2, 2018 at 8:43 PM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> I agree that plan_create_index_workers() needs to count the leader as a
> normal worker for the CREATE INDEX. So what you proposing is - when
> parallel_leader_participation is true launch (return value of
> compute_parallel_worker() - 1)
> workers. true ?

Almost. We need to not subtract one when only one worker is indicated
by compute_parallel_worker(). I also added some new stuff there, to
consider edge cases with the parallel_leader_participation GUC.

>> I'm working on fixing up what you posted. I'm probably not more than a
>> week away from posting a patch that I'm going to mark "ready for
>> committer". I've already made the change above, and once I spend time
>> on trying to break the few small changes needed within buffile.c I'll
>> have taken it as far as I can, most likely.
>>
>
> Okay, once you submit the patch with changes - I will do one round of
> review for the changes.

I've attached my revision. Changes include:

* Changes to plan_create_index_workers() were made along the lines
recently discussed.

* plan_create_index_workers() is now called right before the ambuild
routine is called (nbtree index builds only, of course).

* Significant overhaul of tuplesort.h contract. This had references to
the old approach, and to tqueue.c's tuple descriptor thing that was
since superseded by the typmod registry added for parallel hash join.
These were updated/removed.

* Both tuplesort.c and logtape.c now say that they cannot write to the
writable/last tape, while still acknowledging that it is in fact the
leader tape, and that this restriction is due to a restriction with
BufFiles. They also point out that if the restriction within buffile.c
ever was removed, everything would work fine.

* Added new call to BufFileExportShared() when freezing tape in logtape.c.

* Tweaks to documentation.

* pgindent ran on modified files.

* Polished the stuff that is added to buffile.c. Mostly comments that
clarify its reason for existing. Also added Assert()s.

Note that I added Heikki as an author in the commit message.
Technically, Heikki didn't actually write code for parallel CREATE
INDEX, but he did loads of independently useful work on merging + temp
file I/O that went into Postgres 10 (though this wasn't listed in the
v10 release notes). That work was done in large part to help the
parallel CREATE INDEX patch, and it did in fact help it quite
noticeably, so I think that this is warranted. Remember that with
parallel CREATE INDEX, the leader's merge occurs serially, so anything
that we can do to speed that part up is very helpful.

This revision does seem very close, but I'll hold off on changing the
status of the patch for a few more days, to give you time to give some
feedback.

--
Peter Geoghegan

Attachment Content-Type Size
0001-Add-parallel-B-tree-index-build-sorting.patch text/x-patch 161.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-01-05 22:23:08 Re: [HACKERS] Proposal: Local indexes for partitioned table
Previous Message Robert Haas 2018-01-05 22:17:21 Re: [HACKERS] Proposal: Local indexes for partitioned table