Re: Memory usage during sorting

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-04-14 13:34:56
Message-ID: CAEYLb_Uhwz0BRQR42GJvmcUtSxDW1WYOzD5fOzE-3b1jPrCGpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 April 2012 13:32, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>> Well, timsort is specifically designed to take advantage of pre-sorted
>> data. It does appear to have a lot of traction, as wikipedia points
>> out:
>
> I hadn't heard of it. But reading up on it it does seem like a good
> fit for us. It trades some additional storage -- an array of pointers
> into the sort array where in our case the pointers would be much
> smaller than a whole SortTuple -- for fewer comparisons -- which in
> our case are often much slower than a simple integer comparison.

I wouldn't go so far as to suggest getting rid of quicksort, of
course. Quicksort is generally faster than other average case O(n log
n) algorithms in practice, for various reasons, principal among them
that it takes advantage of memory locality so well. I don't think that
it's a coincidence that timsort is popular in Python and Java land.
Even the Python C implementation has to sort integers through all that
PyObject reference indirection, I think. I'd now speculate that an
appropriate use of this algorithm might be to simply use it with types
that don't have a SortSupport function, that are largely passed by
reference, and have expensive comparisons. FWIW, I started playing
with adding timsort to Postgres last night:

https://github.com/Peter2ndQuadrant/postgres/tree/timsort

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-04-14 13:54:39 Re: Patch: add timing of buffer I/O requests
Previous Message Magnus Hagander 2012-04-14 13:05:50 Re: column name of pg_stat_replication.backend_start