Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-07 00:25:23
Message-ID: CAM3SWZTx-RuMny3Bo0tiQCz-Kbqpkrxkse6DvsGoaG0vX+uSdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2015 at 4:33 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> So, the bottom line is: This patch seems very good, is unlikely to
> have any notable downside (no case has been shown to be regressed),
> but has yet to receive code review. I am working on a new version with
> the first two commits consolidated, and better comments, but that will
> have the same code, unless I find bugs or am dissatisfied. It mostly
> needs thorough code review, and to a lesser extent some more
> performance testing.

I'm currently spending a lot of time working on parallel CREATE INDEX.
I should not delay posting a new version of my patch series any
further, though. I hope to polish up parallel CREATE INDEX to be able
to show people something in a couple of weeks.

This version features consolidated commits, the removal of the
multipass_warning parameter, and improved comments and commit
messages. It has almost entirely unchanged functionality.

The only functional changes are:

* The function useselection() is taught to distrust an obviously bogus
caller reltuples hint (when it's already less than half of what we
know to be the minimum number of tuples that the sort must sort,
immediately after LACKMEM() first becomes true -- this is probably a
generic estimate).

* Prefetching only occurs when writing tuples. Explicit prefetching
appears to hurt in some cases, as David Rowley has shown over on the
dedicated thread. But it might still be that writing tuples is a case
that is simple enough to benefit consistently, due to the relatively
uniform processing that memory latency can hide behind for that case
(before, the same prefetching instructions were used for CREATE INDEX
and for aggregates, for example).

Maybe we should consider trying to get patch 0002 (the memory
pool/merge patch) committed first, something Greg Stark suggested
privately. That might actually be an easier way of integrating this
work, since it changes nothing about the algorithm we use for merging
(it only improves memory locality), and so is really an independent
piece of work (albeit one that makes a huge overall difference due to
the other patches increasing the time spent merging in absolute terms,
and especially as a proportion of the total).

--
Peter Geoghegan

Attachment Content-Type Size
0003-Perform-memory-prefetching-when-writing-memtuples.patch text/x-patch 7.9 KB
0002-Use-tuple-proper-memory-pool-for-tuplesort-merge.patch text/x-patch 24.6 KB
0001-Quicksort-when-performing-external-sorts.patch text/x-patch 57.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-12-07 00:45:02 Re: Using quicksort for every external sort run
Previous Message Peter Geoghegan 2015-12-07 00:04:05 Re: Using quicksort for every external sort run