|From:||Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>|
|To:||Peter Geoghegan <pg(at)bowt(dot)ie>|
|Cc:||Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>|
|Subject:||Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)|
|Views:||Raw Message | Whole Thread | Download mbox|
Thanks for review.
On Fri, Dec 8, 2017 at 6:27 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Dec 7, 2017 at 12:25 AM, Rushabh Lathia
> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> > 0001-Add-parallel-B-tree-index-build-sorting_v14.patch
> Cool. I'm glad that we now have a patch that applies cleanly against
> master, while adding very little to buffile.c. It feels like we're
> getting very close here.
> >> * You didn't point out the randomAccess restriction in tuplesort.h.
> > I did, it's there in the file header comments.
> I see what you wrote in tuplesort.h here:
> > + * algorithm, and are typically only used for large amounts of data.
> > + * that parallel sorts is not support for random access to the sort
> This should say "...are not support when random access is requested".
> > Added handling for parallel_leader_participation as well as deleted
> > compile time option force_single_worker.
> I still see this:
> > +
> > +/*
> > + * A parallel sort with one worker process, and without any
> > + * state may be used for testing the parallel tuplesort infrastructure.
> > + */
> > +#ifdef NOT_USED
> > +#define FORCE_SINGLE_WORKER
> > +#endif
> Looks like you missed this FORCE_SINGLE_WORKER hunk -- please remove it,
> >> The parallel_leader_participation docs will also need to be updated.
> > Done.
> I don't see this. There is no reference to
> parallel_leader_participation in the CREATE INDEX docs, nor is there a
> reference to CREATE INDEX in the parallel_leader_participation docs.
I thought parallel_leader_participation is generic GUC which get effect
for all parallel operation. isn't it? On that understanding I just update
documentation of parallel_leader_participation into config.sgml to
make it more generalize.
> > Also performed more testing with the patch, with
> > parallel_leader_participation
> > ON and OFF. Found one issue, where earlier we always used to call
> > _bt_leader_sort_as_worker() but now need to skip the call if
> > parallel_leader_participation
> > is OFF.
> Hmm. I think the local variable within _bt_heapscan() should go back.
> Its value should be directly taken from parallel_leader_participation
> assignment, once. There might be some bizarre circumstances where it
> is possible for the value of parallel_leader_participation to change
> in flight, causing a race condition: we start with the leader as a
> participant, and change our mind later within
> _bt_leader_sort_as_worker(), causing the whole CREATE INDEX to hang
> Even if that's impossible, it seems like an improvement in style to go
> back to one local variable controlling everything.
Yes, to me also it's looks kind of impossible situation but then too
it make sense to make one local variable and then always read the
value from that.
> Style issue here:
> > + long start_block = file->numFiles * BUFFILE_SEG_SIZE;
> > + int newNumFiles = file->numFiles + source->numFiles;
> Shouldn't start_block conform to the surrounding camelCase style?
> Finally, two new thoughts on the patch, that are not responses to
> anything you did in v14:
> 1. Thomas' barrier abstraction was added by commit 1145acc7. I think
> that you should use a static barrier in tuplesort.c now, and rip out
> the ConditionVariable fields in the Sharedsort struct. It's only a
> slightly higher level of abstraction for tuplesort.c, which makes only
> a small difference given the simple requirements of tuplesort.c.
> However, I see no reason to not go that way if that's the new
> standard, which it is. This looks like it will be fairly easy.
Pending, as per Thomas' explanation, it seems like need some more
work in the barrier APIs.
> 2. Does the plan_create_index_workers() cost model need to account for
> parallel_leader_participation, too, when capping workers? I think that
> it does.
> The relevant planner code is:
> > + /*
> > + * Cap workers based on available maintenance_work_mem as needed.
> > + *
> > + * Note that each tuplesort participant receives an even share of the
> > + * total maintenance_work_mem budget. Aim to leave workers (where
> > + * leader-as-worker Tuplesortstate counts as a worker) with no less
> > + * 32MB of memory. This leaves cases where maintenance_work_mem is
> set to
> > + * 64MB immediately past the threshold of being capable of launching
> > + * single parallel worker to sort.
> > + */
> > + sort_mem_blocks = (maintenance_work_mem * 1024L) / BLCKSZ;
> > + min_sort_mem_blocks = (32768L * 1024L) / BLCKSZ;
> > + while (parallel_workers > min_parallel_workers &&
> > + sort_mem_blocks / (parallel_workers + 1) <
> > + parallel_workers--;
> This parallel CREATE INDEX planner code snippet is about the need to
> have low per-worker maintenance_work_mem availability prevent more
> parallel workers from being added to the number that we plan to
> launch. Each worker tuplesort state needs at least 32MB. We clearly
> need to do something here.
> While it's always true that "leader-as-worker Tuplesortstate counts as
> a worker" in v14, I think that it should only be true in the next
> revision of the patch when parallel_leader_participation is actually
> true (IOW, we should only add 1 to parallel_workers within the loop
> invariant in that case). The reason why we need to consider
> parallel_leader_participation within this plan_create_index_workers()
> code is simple: During execution, _bt_leader_sort_as_worker() uses
> "worker tuplesort states"/btshared->scantuplesortstates to determine
> how much of a share of maintenance_work_mem each worker tuplesort
> gets. Our planner code needs to take that into account, now that the
> nbtsort.c parallel_leader_participation behavior isn't just some
> obscure debug option. IOW, the planner code needs to be consistent
> with the nbtsort.c execution code.
Ah nice catch. I passed the local variable (leaderasworker) of
to plan_create_index_workers() rather than direct reading value from the
parallel_leader_participation (reasons are same as you explained earlier).
|Next Message||Jakub Glapa||2017-12-08 07:59:07||Re: ERROR: too many dynamic shared memory segments|
|Previous Message||Sergei Kornilov||2017-12-08 07:22:53||Re: proposal: alternative psql commands quit and exit|