Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-03 04:43:27
Message-ID: CAGPqQf2xmSsSoob+ZaXh1T+5pJ=uGrXNmfgzcqrsekSqtHbcWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 3, 2018 at 9:11 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Jan 2, 2018 at 1:38 AM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
> wrote:
> > Need to do after the indexRelation build. So I added after update of
> > pg_index,
> > as indexRelation needed for plan_create_index_worders().
> >
> > Attaching the separate patch the same.
>
> This made it so that REINDEX and CREATE INDEX CONCURRENTLY no longer
> used parallelism. I think we need to do this very late, just before
> nbtree's ambuild() routine is called from index.c.
>
>
Ahh right. We should move the plan_create_index_workers() call to
index_build() before the ambuild().

> > So you suggesting that need to do adjustment with the output of
> > compute_parallel_worker() by considering parallel_leader_participation?
>
> We know for sure that there is no reason to not use the leader process
> as a worker process in the case of parallel CREATE INDEX. So we must
> not have the number of participants (i.e. worker Tuplesortstates) vary
> based on the current parallel_leader_participation setting. While
> parallel_leader_participation can affect the number of worker
> processes requested, that's a different thing. There is no question
> about parallel_leader_participation ever being relevant to performance
> -- it's strictly a testing option for us.
>
> Even after parallel_leader_participation was added,
> compute_parallel_worker() still assumes that the sequential scan
> leader is always too busy to help. compute_parallel_worker() seems to
> think that that's something that the leader does in "rare" cases not
> worth considering -- cases where it has no worker tuples to consume
> (maybe I'm reading too much into it not caring about
> parallel_leader_participation, but I don't think so). If
> compute_parallel_worker()'s assumption was questionable before, it's
> completely wrong for parallel CREATE INDEX. I think
> plan_create_index_workers() needs to count the leader-as-worker as an
> ordinary worker, not special in any way by deducting one worker from
> what compute_parallel_worker() returns. (This only happens when it's
> necessary to compensate -- when leader-as-worker participation is
> going to go ahead.)
>
>
Yes, event with parallel_leader_participation - compute_parallel_worker()
doesn't take that into consideration. Or may be the assumption is that
launch the number of workers return by the compute_parallel_worker(),
irrespective of whether leader is going to participate in a scan or not.

I agree that plan_create_index_workers() needs to count the leader as a
normal worker for the CREATE INDEX. So what you proposing is - when
parallel_leader_participation is true launch (return value of
compute_parallel_worker() - 1)
workers. true ?

I'm working on fixing up what you posted. I'm probably not more than a
> week away from posting a patch that I'm going to mark "ready for
> committer". I've already made the change above, and once I spend time
> on trying to break the few small changes needed within buffile.c I'll
> have taken it as far as I can, most likely.
>
>
Okay, once you submit the patch with changes - I will do one round of
review for the changes.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gerdan Rezende dos Santos 2018-01-03 05:02:41 Re: CFM for January commitfest?
Previous Message Tom Lane 2018-01-03 04:34:47 Re: copy_file_range is now a Linux kernel call