Re: Memory usage during sorting

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-02-27 03:37:39
Message-ID: 21342.1330313859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> A quick Google search for external sorting algorithms suggest that the
> typical way of doing an external sort is to read data until you fill
> your in-memory buffer, quicksort it, and dump it out as a run. Repeat
> until end-of-data; then, merge the runs (either in a single pass, or
> if there are too many, in multiple passes). I'm not sure whether that
> would be better than what we're doing now, but there seem to be enough
> other people doing it that we might want to try it out. Our current
> algorithm is to build a heap, bounded in size by work_mem, and dribble
> tuples in and out, but maintaining that heap is pretty expensive;
> there's a reason people use quicksort rather than heapsort for
> in-memory sorting.

Well, the reason the heapsort approach is desirable here is that you end
up with about half as many runs to merge, because the typical run length
is significantly more than what will fit in work_mem. Don't get too
excited about micro-optimization if you haven't understood the larger
context.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-27 03:44:32 Re: pgstat documentation tables
Previous Message Robert Haas 2012-02-27 03:37:14 Re: Website stylesheet for local docs