Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-11-10 03:18:56
Message-ID: CAM3SWZRG+zyxBDJuMfgu6vBFA9q-eY9mfLKqkq39m0OT3sNcAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 9, 2016 at 6:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I guess that's possible, but the problem with polyphase merge is that
> the increased I/O becomes a pretty significant cost in a hurry.

Not if you have a huge RAID array. :-)

Obviously I'm not seriously suggesting that we revise the cap from 500
to 7. We're only concerned about the constant factors here. There is a
clearly a need to make some simplifying assumptions. I think that you
understand this very well, though.

> Maybe another way of putting this is that, while there's clearly a
> benefit to having some kind of a cap, it's appropriate to pick a large
> value, such as 500. Having no cap at all risks creating many extra
> tapes that just waste memory, and also risks an unduly
> cache-inefficient final merge. Reigning that in makes sense.
> However, we can't reign it in too far or we'll create slow polyphase
> merges in case that are reasonably likely to occur in real life.

I completely agree with your analysis.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-11-10 04:04:04 Re: Hash Indexes
Previous Message Michael Paquier 2016-11-10 03:12:10 Re: Re: [COMMITTERS] pgsql: pgbench: Allow the transaction log file prefix to be changed.