Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-09-20 15:17:19
Message-ID: CA+TgmoYZX8EoEQqbsrWqO72oZ9SObAU1FifgLr45h59hy=KoUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 20, 2017 at 5:32 AM, Rushabh Lathia
<rushabh(dot)lathia(at)gmail(dot)com> wrote:
> First application for the tuplesort here is CREATE INDEX and that doesn't
> need randomAccess. But as you said and in the thread its been discussed,
> randomAccess is an important and we should sure put an efforts to support
> the same.

There's no direct benefit of working on randomAccess support unless we
have some code that wants to use that support for something. Indeed,
it would just leave us with code we couldn't test.

While I do agree that there are probably use cases for randomAccess, I
think what we should do right now is try to get this patch reviewed
and committed so that we have parallel CREATE INDEX for btree indexes.
And in so doing, let's keep it as simple as possible. Parallel CREATE
INDEX for btree indexes is a great feature without adding any more
complexity.

Later, anybody who wants to work on randomAccess support -- and
whatever planner and executor changes are needed to make effective use
of it -- can do so. For example, one can imagine a plan like this:

Gather
-> Merge Join
-> Parallel Index Scan
-> Parallel Sort
-> Parallel Seq Scan

If the parallel sort reads out all of the output in every worker, then
it becomes legal to do this kind of thing -- it would end up, I think,
being quite similar to Parallel Hash. However, there's some question
in my mind as to whether want to do this or, say, hash-partition both
relations and then perform separate joins on each partition. The
above plan is clearly better than what we can do today, where every
worker would have to repeat the sort, ugh, but I don't know if it's
the best plan. Fortunately, to get this patch committed, we don't
have to figure that out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Zakirov 2017-09-20 15:19:35 Re: [PATCH] Generic type subscripting
Previous Message Robert Haas 2017-09-20 14:35:18 Re: Page Scan Mode in Hash Index