Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-01-31 05:15:45
Message-ID: CAH2-Wzn=+MDmhOVHQ=M3TYxbCtPQsWStt0GT273XhADkiNoOvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Attached is V7 of the patch.
>
> I am doing some testing. First, some superficial things from first pass:
>
> [Various minor cosmetic issues]

Oops.

> Just an observation: if you ask for a large number of workers, but
> only one can be launched, it will be constrained to a small fraction
> of maintenance_work_mem, but use only one worker. That's probably OK,
> and I don't see how to do anything about it unless you are prepared to
> make workers wait for an initial message from the leader to inform
> them how many were launched.

Actually, the leader-owned worker Tuplesort state will have the
appropriate amount, so you'd still need to have 2 participants (1
worker + leader-as-worker). And, sorting is much less sensitive to
having a bit less memory than hashing (at least when there isn't
dozens of runs to merge in the end, or multiple passes). So, I agree
that this isn't worth worrying about for a DDL statement.

> Should this 64KB minimum be mentioned in the documentation?

You mean user-visible documentation, and not just tuplesort.h? I don't
think that that's necessary. That's a ludicrously low amount of memory
for a worker to be limited to anyway. It will never come up with
remotely sensible use of the feature.

> + if (!btspool->isunique)
> + {
> + shm_toc_estimate_keys(&pcxt->estimator, 2);
> + }
>
> Project style: people always tell me to drop the curlies in cases like
> that. There are a few more examples in the patch.

I only do this when there is an "else" that must have curly braces,
too. There are plenty of examples of this from existing code, so I
think it's fine.

> + /* Wait for workers */
> + ConditionVariableSleep(&shared->workersFinishedCv,
> + WAIT_EVENT_PARALLEL_FINISH);
>
> I don't think we should reuse WAIT_EVENT_PARALLEL_FINISH in
> tuplesort_leader_wait and worker_wait. That belongs to
> WaitForParallelWorkersToFinish, so someone who see that in
> pg_stat_activity won't know which it is.

Noted.

> IIUC worker_wait() is only being used to keep the worker around so its
> files aren't deleted. Once buffile cleanup is changed to be
> ref-counted (in an on_dsm_detach hook?) then workers might as well
> exit sooner, freeing up a worker slot... do I have that right?

Yes. Or at least I think it's very likely that that will end up happening.

> Incidentally, barrier.c could probably be used for this
> synchronisation instead of these functions. I think
> _bt_begin_parallel would call BarrierInit(&shared->barrier,
> scantuplesortstates) and then after LaunchParallelWorkers() it'd call
> a new interface BarrierDetachN(&shared->barrier, scantuplesortstates -
> pcxt->nworkers_launched) to forget about workers that failed to
> launch. Then you could use BarrierWait where the leader waits for the
> workers to finish, and BarrierDetach where the workers are finished
> and want to exit.

I thought about doing that, actually, but I don't like creating
dependencies on some other uncommited patch, which is a moving target
(barrier stuff isn't committed yet). It makes life difficult for
reviewers. I put off adopting condition variables until they were
committed for the same reason -- it's was easy to do without them for
a time. I'll probably get around to it before too long, but feel no
urgency about it. Barriers will only allow me to make a modest net
removal of code, AFAIK.

Thanks
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-31 05:19:49 Refactoring of replication commands using printsimple
Previous Message Vitaly Burovoy 2017-01-31 05:13:15 Re: macaddr 64 bit (EUI-64) datatype support