Re: Using quicksort for every external sort run

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-08-21 06:56:22
Message-ID: CANP8+jJZaSz9cRqqTAYz58DwD_L-CKnepjV+=UkRU-YiTmG2Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 August 2015 at 18:41, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > On 20 August 2015 at 03:24, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> >>
> >>
> >> The patch is ~3.25x faster than master
> >
> >
> > I've tried to read this post twice and both times my work_mem overflowed.
> > ;-)
> >
> > Can you summarize what this patch does? I understand clearly what it
> doesn't
> > do...
>
> The most important thing that it does is always quicksort runs, that
> are formed by simply filling work_mem with tuples in no particular
> order, rather than trying to make runs that are twice as large as
> work_mem on average. That's what the ~3.25x improvement concerned.
> That's actually a significantly simpler algorithm than replacement
> selection, and appears to be much faster.

Then I think this is fine, not least because it seems like a first step
towards parallel sort.

This will give more runs, so merging those needs some thought. It will also
give a more predictable number of runs, so we'll be able to predict any
merging issues ahead of time. We can more easily find out the min/max tuple
in each run, so we only merge overlapping runs.

> You might even say that it's
> a dumb algorithm, because it is less sophisticated than replacement
> selection. However, replacement selection tends to use CPU caches very
> poorly, while its traditional advantages have become dramatically less
> important due to large main memory sizes in particular. Also, it hurts
> that we don't currently dump tuples in batches, for several reasons.
> Better to do memory intense operations in batch, rather than having a
> huge inner loop, in order to minimize or prevent instruction cache
> misses. And we can better take advantage of asynchronous I/O.
>
> The complicated aspect of considering the patch is whether or not it's
> okay to not use replacement selection anymore -- is that an
> appropriate trade-off?
>

Using a heapsort is known to be poor for large heaps. We previously
discussed the idea of quicksorting the first chunk of memory, then
reallocating the heap as a smaller chunk for the rest of the sort. That
would solve the cache miss problem.

I'd like to see some discussion of how we might integrate aggregation and
sorting. A heap might work quite well for that, whereas quicksort doesn't
sound like it would work as well.

> The reason that the code has not actually been simplified by this
> patch is that I still want to use replacement selection for one
> specific case: when it is anticipated that a "quicksort with
> spillover" can occur, which is only possible with incremental
> spilling. That may avoid most I/O, by spilling just a few tuples using
> a heap/priority queue, and quicksorting everything else. That's
> compelling when you can manage it, but no reason to always use
> replacement selection for the first run in the common case where there
> well be several runs in total.
>

I think its premature to retire that algorithm - I think we should keep it
for a while yet. I suspect it may serve well in cases where we have low
memory, though I accept that is no longer the case for larger servers that
we would now call typical.

This could cause particular issues in optimization, since heap sort is
wonderfully predictable. We'd need a cost_sort() that was slightly
pessimistic to cover the risk that a quicksort might not be as fast as we
hope.

> Is that any clearer?

Yes, thank you.

I'd like to see a more general and concise plan for how sorting evolves. We
are close to having the infrastructure to perform intermediate aggregation,
which would allow that to happen during sorting when required (aggregation,
sort distinct). We also agreed some time back that parallel sorting would
be the first incarnation of parallel operations, so we need to consider
that also.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-08-21 12:25:53 Re: Warnings around booleans
Previous Message Pavan Deolasee 2015-08-21 06:20:11 Re: Declarative partitioning