Re: Parallel tuplesort, partitioning, merging, and the future

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Parallel tuplesort, partitioning, merging, and the future
Date: 2016-08-12 19:22:24
Message-ID: CA+Tgmob=Pas24FiJ9M24+=3e_8Dtz8i8i3aHjZJ83P+HJianyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2016 at 4:54 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> My view on this - currently anyway - is that we shouldn't conflate the
>> tuplesort with the subsequent index generation, but that we should try
>> to use parallelism within the tuplesort itself to the greatest extent
>> possible. If there is a single output stream that the leader uses to
>> generate the final index, then none of the above problems arise. They
>> only arise if you've got multiple processes actually writing to the
>> index.
>
> I'm not sure if you're agreeing with my contention about parallel
> CREATE INDEX not being a good target for partitioning here. Are you?

No. I agree that writing to the index in parallel is bad, but I think
it's entirely reasonable to try to set things up so that the leader
does as little of the final merge work itself as possible, instead
offloading that to workers. Unless, of course, we can prove that the
overhead of the final merge pass is so low that it doesn't matter
whether we offload it.

> While all this speculation about choice of algorithm is fun,
> realistically I'm not gong to write the patch for a rainy day (nor for
> parallel CREATE INDEX, at least until we become very comfortable with
> all the issues I raise, which could never happen). I'd be happy to
> consider helping you improve parallel query by providing
> infrastructure like this, but I need someone else to write the client
> of the infrastructure (e.g. a parallel merge join patch), or to at
> least agree to meet me half way with an interdependent prototype of
> their own. It's going to be messy, and we'll have to do a bit of
> stumbling to get to a good place. I can sign up to that if I'm not the
> only one that has to stumble.

Fair enough.

> Serial merging still needs work, it seems.

At the risk of stating the obvious, improving serial execution
performance is always superior to comparable gains originating from
parallelism, so no complaints here about work in that area.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-12 19:22:27 Re: Is there a way around function search_path killing SQL function inlining?
Previous Message Andrew Gierth 2016-08-12 19:13:33 Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)