Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-08-20 17:41:36
Message-ID: CAM3SWZSBFNwoEM8hb4VxbNS5H2gg-f_s84j=Bzpi_gt-qA4KMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 20 August 2015 at 03:24, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>>
>>
>> The patch is ~3.25x faster than master
>
>
> I've tried to read this post twice and both times my work_mem overflowed.
> ;-)
>
> Can you summarize what this patch does? I understand clearly what it doesn't
> do...

The most important thing that it does is always quicksort runs, that
are formed by simply filling work_mem with tuples in no particular
order, rather than trying to make runs that are twice as large as
work_mem on average. That's what the ~3.25x improvement concerned.
That's actually a significantly simpler algorithm than replacement
selection, and appears to be much faster. You might even say that it's
a dumb algorithm, because it is less sophisticated than replacement
selection. However, replacement selection tends to use CPU caches very
poorly, while its traditional advantages have become dramatically less
important due to large main memory sizes in particular. Also, it hurts
that we don't currently dump tuples in batches, for several reasons.
Better to do memory intense operations in batch, rather than having a
huge inner loop, in order to minimize or prevent instruction cache
misses. And we can better take advantage of asynchronous I/O.

The complicated aspect of considering the patch is whether or not it's
okay to not use replacement selection anymore -- is that an
appropriate trade-off?

The reason that the code has not actually been simplified by this
patch is that I still want to use replacement selection for one
specific case: when it is anticipated that a "quicksort with
spillover" can occur, which is only possible with incremental
spilling. That may avoid most I/O, by spilling just a few tuples using
a heap/priority queue, and quicksorting everything else. That's
compelling when you can manage it, but no reason to always use
replacement selection for the first run in the common case where there
well be several runs in total.

Is that any clearer? To borrow a phrase from the processor
architecture community, from a high level this is a "Brainiac versus
Speed Demon" [1] trade-off. (I wish that there was a widely accepted
name for this trade-off.)

[1] http://www.lighterra.com/papers/modernmicroprocessors/#thebrainiacdebate
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-08-20 19:11:02 Re: allowing wal_level change at run time
Previous Message Andrew Dunstan 2015-08-20 17:37:55 Re: TAP tests are badly named