Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-10-12 19:05:11
Message-ID: CAM3SWZR9oBEyL92kQ5SATuzd-piwdAuhhZvnUAJW9U3n5ZOxpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 12, 2016 at 11:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I realize that you are primarily targeting utility commands here, and
> that is obviously great, because making index builds faster is very
> desirable. However, I'd just like to talk for a minute about how this
> relates to parallel query. With Rushabh's Gather Merge patch, you can
> now have a plan that looks like Gather Merge -> Sort -> whatever.
> That patch also allows other patterns that are useful completely
> independently of this patch, like Finalize GroupAggregate -> Gather
> Merge -> Partial GroupAggregate -> Sort -> whatever, but the Gather
> Merge -> Sort -> whatever path is very related to what this patch
> does. For example, instead of committing this patch at all, we could
> try to funnel index creation through the executor, building a plan of
> that shape, and using the results to populate the index. I'm not
> saying that's a good idea, but it could be done.

Right, but that would be essentially the same approach as mine, but, I
suspect, less efficient and more complicated. More importantly, it
wouldn't be partitioning, and partitioning is what we really need
within the executor.

> On the flip side, what if anything can queries hope to get out of
> parallel sort that they can't get out of Gather Merge? One
> possibility is that a parallel sort might end up being substantially
> faster than Gather Merge-over-non-parallel sort. In that case, we
> obviously should prefer it.

I must admit that I don't know enough about it to comment just yet.
Offhand, it occurs to me that the Gather Merge sorted input could come
from a number of different types of paths/nodes, whereas adopting what
I've done here could only work more or less equivalently to "Gather
Merge -> Sort -> Seq Scan" -- a special case, really.

> For example, it's possible that you might want to have all
> workers participate in sorting some data and then divide the result of
> the sort into equal ranges that are again divided among the workers,
> or that you might want all workers to sort and then each worker to
> read a complete copy of the output data. But these don't seem like
> particularly mainstream needs, nor do they necessarily seem like
> problems that parallel sort itself should be trying to solve.

This project of mine is about parallelizing tuplesort.c, which isn't
really what you want for parallel query -- you shouldn't try to scope
the problem as "make the sort more scalable using parallelism" there.
Rather, you want to scope it at "make the execution of the entire
query more scalable using parallelism", which is really quite a
different thing, which necessarily involves the executor having direct
knowledge of partition boundaries. Maybe the executor enlists
tuplesort.c to help with those boundaries to some degree, but that
whole thing is basically something which treats tuplesort.c as a low
level primitive.

> The
> Volcano paper[1], one of the oldest and most-cited sources I can find
> for research into parallel execution and with a design fairly similar
> to our own executor, describes various variants of what they call
> Exchange, of which what we now call Gather is one.

I greatly respect the work of Goetz Graef, including his work on the
Volcano paper. Graef has been the single biggest external influence on
my work on Postgres.

> They describe
> another variant called Interchange, which acts like a Gather node
> without terminating parallelism: every worker process reads the
> complete output of an Interchange, which is the union of all rows
> produced by all workers running the Interchange's input plan. That
> seems like a better design than coupling such data flows specifically
> to parallel sort.
>
> I'd like to think that parallel sort will help lots of queries, as
> well as helping utility commands, but I'm not sure it will. Thoughts?

You are right that I'm targeting the cases where we can get real
benefits without really changing the tuplesort.h contract too much.
This is literally the parallel tuplesort.c patch, which probably isn't
very useful for parallel query, because the final output is always
consumed serially here (this doesn't matter all that much for CREATE
INDEX, I believe). This approach of mine seems like the simplest way
of getting a large benefit to users involving parallelizing sorting,
but I certainly don't imagine it to be the be all and end all.

I have at least tried to anticipate how tuplesort.c will eventually
serve the needs of partitioning for the benefit of parallel query. My
intuition is that you'll have to teach it about partitioning
boundaries fairly directly -- it won't do to add something generic to
the executor. And, it probably won't be the only thing that needs to
be taught about them.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Devrim Gündüz 2016-10-12 19:05:21 Re: Non-empty default log_line_prefix
Previous Message Robert Haas 2016-10-12 19:02:53 Re: pg_dump getBlobs query broken for 7.3 servers