Re: Memory usage during sorting

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-04-13 16:42:37
Message-ID: CAEYLb_WuP-POiDykip0Dcdy07k3GC5Dn4riK5BFY-JWg8Eq9RA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 April 2012 16:03, Greg Stark <stark(at)mit(dot)edu> wrote:
> Fwiw it also only holds for comparison sorts. If you can sort your
> items without actually comparing each item with the others then you
> aren't bound by it at all. Notably algorithms like radix sort and
> direct sort aren't bound by it and are O(n). I'm hoping to look at
> trying some of these for integer sorts when they apply using the new
> sort specialization code Peter added.

Actually, though a later revision of the patch did nominally allow for
user-defined sorting functions through the SortSupport infrastructure
(I didn't intend that it would be complete/useful enough to be really
worth documenting), the version that was finally committed did not.
However, there is a fairly obvious entry point for a radix sort, which
is here:

/*
* We were able to accumulate all the tuples within the allowed
* amount of memory. Just qsort 'em and we're done.
*/
if (state->memtupcount > 1)
{
/* Can we use the single-key sort function? */
if (state->onlyKey != NULL)
qsort_ssup(state->memtuples, state->memtupcount,
state->onlyKey);
else
qsort_tuple(state->memtuples,
state->memtupcount,
state->comparetup,
state);
}

The only specialisation that occurs here is between tuple sorts of a
single key and multiple (type-specific specialisations were ultimately
not judged to be worth the increase in binary size relative to their
diminishing benefits). You'd probably set-up a type-specific
positional notation machinery within the state's SortSupport struct
(the type-specific abstraction through which we elide the SQL function
call machinery for types that support it).

One insight that I had at the time was that text comparisons where the
c locale isn't used are really rather expensive, and I doubt that
there is too much that can be done to address that directly. However,
if we were to support timsort, that could substantially cut down on
the number of comparisons for text columns, which could be a real win.
Maybe that would happen through some explicit mechanism, or maybe the
planner could actually infer that it was likely to be optimal to use
timsort based on a high statistical correlation between physical row
ordering and logical ordering of a key column's values.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-13 16:48:45 Re: [trivial patch] grammar fixes in doc/src/sgml/high-availability.sgml
Previous Message Robert Haas 2012-04-13 16:42:23 Re: Improving our clauseless-join heuristics