Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-19 21:27:03
Message-ID: CAEepm=0J2GL8hF9+Q6s_jadQzxnsMTia_BY=QRFLBAbU-p8MGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 20, 2018 at 6:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 19, 2018 at 12:16 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> Clarity on what I should do about parallel_leader_participation in the
>> next revision would be useful at this point. You seem to either want
>> me to remove it from consideration entirely, or to remove the code
>> that specifically disallows a "degenerate parallel CREATE INDEX". I
>> need a final answer on that.
>
> Right. I do think that we should do one of those things, and I lean
> towards removing it entirely, but I'm not entirely sure. Rather
> than making an executive decision immediately, I'd like to wait a few
> days to give others a chance to comment. I am hoping that we might get
> some other opinions, especially from Thomas who implemented
> parallel_leader_participation, or maybe Amit who has been reviewing
> recently, or anyone else who is paying attention to this thread.

Well, I see parallel_leader_participation as having these reasons to exist:

1. Gather could in rare circumstances not run the plan in the leader.
This can hide bugs. It's good to be able to force that behaviour for
testing.

2. Plans that tie up the leader process for a long time cause the
tuple queues to block, which reduces parallelism. I speculate that
some people might want to turn that off in production, but at the very
least it seems useful for certain kinds of performance testing to be
able to remove this complication from the picture.

3. The planner's estimations of parallel leader contribution are
somewhat bogus, especially if the startup cost is high. It's useful
to be able to remove that problem from the picture sometimes, at least
for testing and development work.

Parallel CREATE INDEX doesn't have any of those problems. The only
reason I can see for it to respect parallel_leader_participation = off
is for consistency with Gather. If someone decides to run their
cluster with that setting, then it's slightly odd if CREATE INDEX
scans and sorts with one extra process, but it doesn't seem like a big
deal.

I vote for removing the GUC from consideration for now (ie always use
the leader), and revisiting the question again later when we have more
experience or if the parallel degree logic becomes more sophisticated
in future.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-19 21:43:15 Re: PATCH: Configurable file mode mask
Previous Message Tom Lane 2018-01-19 21:07:24 Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall