On 13 April 2012 18:33, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> We do use insertion sort for partitions of size < 7. I assume you are
> referring to something else.
I stand corrected. My confusion came from the removal of a later
*switch* to insertion sort in
a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649, which also added the
pre-sorted check where you'd expect to see the insertion switch. Of
course, the n < 7 insertion sort thing is right beside the added
check. So another, redundant copy of insertion sort was removed, and
not the one that almost every real-world quicksort implementation has.
> Heap-sorting could also benefit from some optimization in this area.
> Right now, if you heap-sort data that's already in order, it gets
> progressively slower as you crank up work_mem, because the heap gets
> deeper and so extraction and siftup get more expensive, for no real
> benefit. We could do something like check, every time we use up our
> available memory, whether the heap is already entirely in order. If
> so, we could dump all but one tuple to the current run and the start
> reading tuples again. Or maybe dump some percentage of the array,
> though that might require a bit more bookkeeping. Or maybe there's a
> smarter algorithm that would also cater to mostly-sorted data.
Well, timsort is specifically designed to take advantage of pre-sorted
data. It does appear to have a lot of traction, as wikipedia points
"Timsort has been Python's standard sorting algorithm since version
2.3. It is now also used to sort arrays in Java SE 7, and on the
Android platform. "
Somebody has produced a BSD-licensed C implementation that draws
heavily on the Python/Java one here:
It looks like it has exactly the same interface as a c stdlib qsort.
So it'd be fairly easy to produce a timsort_arg() based on this, if
anyone cares to.
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
In response to
pgsql-hackers by date
|Next:||From: Kevin Grittner||Date: 2012-04-13 18:15:10|
|Subject: Re: [COMMITTERS] pgsql: Add new replication mode
synchronous_commit = 'write'.|
|Previous:||From: Robert Haas||Date: 2012-04-13 17:33:43|
|Subject: Re: Memory usage during sorting|