Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-11-10 01:03:17
Message-ID: CAM3SWZSBseLRRPFT8wktML57QAFayt2e1zeaZVJp7bZg6a4gTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 9, 2016 at 4:54 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> It's more complicated than that. As I said, I think that Knuth
> basically had it right with his sweet spot of 7. I think that commit
> df700e6b40195d28dc764e0c694ac8cef90d4638 was effective in large part
> because a one-pass merge avoided certain overheads not inherent to
> polyphase merge, like all that memory accounting stuff, extra palloc()
> traffic, etc. The expanded use of per tape buffering we have even in
> multi-pass cases likely makes that much less true for us these days.

Also, logtape.c fragmentation made multiple merge pass cases
experience increased random I/O in a way that was only an accident of
our implementation. We've fixed that now, but that problem must have
added further cost that df700e6b40195d28dc764e0c694ac8cef90d4638
*masked* when it was commited in 2006. (I do think that the problem
with the merge heap maintenance fixed recently in
24598337c8d214ba8dcf354130b72c49636bba69 was the biggest problem that
the 2006 work masked, though).

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-11-10 02:57:20 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Peter Geoghegan 2016-11-10 00:54:34 Re: Parallel tuplesort (for parallel B-Tree index creation)