Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2016-01-27 13:20:03
Message-ID: CAM3SWZROc8KSVexHnrpxJraOCmDNd1-Mw19cs0+LSyzOWn8hTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> But yes, let me concede more clearly: the cost model is based on
>> frobbing. But at least it's relatively honest about that, and is
>> relatively simple. I think it might be possible to make it simpler,
>> but I have a feeling that anything we can come up with will basically
>> have the same quality that you so dislike. I don't know how to do
>> better. Frankly, I'd rather be roughly correct than exactly wrong.
>
> Sure, but the fact that the model has huge discontinuities - perhaps
> most notably a case where adding a single tuple to the estimated
> cardinality changes the crossover point by a factor of two - suggests
> that you are probably wrong. The actual behavior does not change
> sharply when the size of the SortTuple array crosses 1GB, but the
> estimates do.

Here is some fairly interesting analysis of Quicksort vs. Heapsort,
from Bentley, coauthor of our own Quicksort implementation:

https://youtu.be/QvgYAQzg1z8?t=16m15s

(This link picks up at the right point to see the comparison, complete
with an interesting graph).

It probably doesn't tell you much that you didn't already know, at
least at this exact point, but it's nice to see Bentley's graph. This
perhaps gives you some idea of why my "quicksort with spillover" cost
model had a cap of MaxAllocSize of SortTuples, past which we always
needed a very compelling case. That was my rough guess of where the
Heapsort graph takes a sharp upward turn. Before then, Bentley shows
that it's close enough to a straight line.

Correct me if I'm wrong, but I think that the only outstanding issue
with all patches posted here so far is the "quicksort with spillover"
cost model. Hopefully this can be cleared up soon. As I've said, I am
very receptive to other people's suggestions about how that should
work.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-01-27 13:30:35 WAL Re-Writes
Previous Message Anastasia Lubennikova 2016-01-27 13:19:43 Re: Patch: fix lock contention for HASHHDR.mutex