Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2016-02-04 09:46:15
Message-ID: CAM3SWZS7fm8a2fjdPoMyz+U4_bHVP1sueU722XoBvEF27p1hog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 30, 2016 at 5:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I meant use "quicksort with spillover" simply because an estimated
>> 90%+ of all tuples have already been consumed. Don't consider the
>> tuple width, etc.
>
> Hmm, it's a thought.

To be honest, it's a bit annoying that this is one issue we're stuck
on, because "quicksort with spillover" is clearly of less importance
overall. (This is a distinct issue from the issue of not using a
replacement selection style heap for the first run much of the time,
which seems to be a discussion about whether and to what extent the
*traditional* advantages of replacement selection hold today, as
opposed to a discussion about a very specific crossover point in my
patch.)

>> Really? Just try it with a heap that is not tiny. Performance tanks.
>> The fact that replacement selection can produce one long run then
>> becomes a liability, not a strength. With a work_mem of something like
>> 1GB, it's *extremely* painful.
>
> I'm not sure exactly what you think I should try. I think a couple of
> people have expressed the concern that your patch might regress things
> on data that is all in order, but I'm not sure if you think I should
> try that case or some case that is not-quite-in-order. "I don't see
> that you've justified that statement" is referring to the fact that
> you presented no evidence in your original post that it's important to
> sometimes use quicksorting even for run #1. If you've provided some
> test data illustrating that point somewhere, I'd appreciate a pointer
> back to it.

I think that the answer to what you should try is simple: Any case
involving a large heap (say, a work_mem of 1GB). No other factor like
correlation seems to change the conclusion about that being generally
bad.

If you have a correlation, then that is *worse* if "quicksort with
spillover" always has us use a heap for the first run, because it
prolongs the pain of using the cache inefficient heap (note that this
is an observation about "quicksort with spillover" in particular, and
not replacement selection in general). The problem you'll see is that
there is a large heap which is __slow__ to spill from, and that's
pretty obvious with or without a correlation. In general it seems
unlikely that having one long run during the merge (i.e. no merge --
seen by having the heap build one long run because we got "lucky" and
"quicksort with spillover" encountered a correlation) can ever hope to
make up for this.

It *could* still make up for it if:

1. There isn't much to make up for in the first place, because the
heap is CPU cache resident. Testing this with a work_mem that is the
same size as CPU L3 cache seems a bit pointless to me, and I think
we've seen that a few times.

and:

2. There are many passes required without a replacement selection
heap, because the volume of data is just so much greater than the low
work_mem setting. Replacement selection makes the critical difference
because there is a correlation, perhaps strong enough to make it one
or two runs rather than, say, 10 or 20 or 100.

I've already mentioned many times that linear growth in the size of
work_mem sharply reduces the need for additional passes during the
merge phase (the observation about quadratic growth that I won't
repeat). These days, it's hard to recommend anything other than "use
more memory" to someone trying to use 4MB to sort 10GB of data. Yeah,
it would also be faster to use replacement selection for the first run
in the hope of getting lucky (actually lucky this time; no quotes),
but it's hard to imagine that that's going to be a better option, no
matter how frugal the user is. Helping users recognize when they could
use more memory effectively seems like the best strategy. That was the
idea behind multipass_warning, but you didn't like that (Greg Stark
was won over on the multipass_warning warning, though). I hope we can
offer something roughly like that at some point (a view?), because it
makes sense.

> How about always starting with replacement selection, but limiting the
> amount of memory that can be used with replacement selection to some
> small value? It could be a separate GUC, or a hard-coded constant
> like 20MB if we're fairly confident that the same value will be good
> for everyone. If the tuples aren't in order, then we'll pretty
> quickly come to the end of the first run and switch to quicksort.

This seems acceptable, although note that we don't have to decide
until we reach the work_mem limit, and not before.

If you want to use a heap for the first run, I'm not excited about the
idea, but if you insist then I'm glad that you at least propose to
limit it to the kind of cases that we *actually* saw regressed (i.e.
low work_mem settings -- like the default work_mem setting, 4MB).
We've seen no actual case with a larger work_mem that is advantaged by
using a heap, even *with* a strong correlation (this is actually
*worst of all*); that's where I am determined to avoid using a heap
automatically.

It wasn't my original insight that replacement selection has become
all but obsolete. It took me a while to come around to that point of
view. One 2014 SIGMOD paper says of replacement selection sort:

"Finally, there has been very little interest in replacement selection
sort and its variants over the last 15 years. This is easy to
understand when one considers that the previous goal of replacement
selection sort was to reduce the number of external memory passes to
2."

> If we do end up using replacement selection for the whole sort, the
> smaller heap is an advantage. What I like about this sort of thing is
> that it adds no reliance on any estimate; it's fully self-tuning.

Fine, but the point of "quicksort with spillover" is that it avoids
I/O entirely. I'm not promoting it as useful for any of the reasons
that replacement selection was traditionally useful (on 1970s
hardware). So, we aren't much closer to working out a better cost
model for "quicksort with spillover" (I guess you weren't really
talking about that, though), an annoying sticking point (as already
mentioned).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-02-04 09:58:06 Re: pgbench stats per script & other stuff
Previous Message Michael Paquier 2016-02-04 09:39:19 Comment typo in slot.c