Re: Memory usage during sorting

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-02-27 03:20:33
Message-ID: CA+TgmoZ1MQ6M1ZcnMKetnNwddkgVm4aza1LA83iK8-newryphg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> I'm not sure about the conclusion, but given this discussion, I'm
>> inclined to mark this Returned with Feedback.
>
> OK, thanks.  Does anyone have additional feed-back on how tightly we
> wish to manage memory usage?  Is trying to make us use as much memory
> as we are allowed to without going over a worthwhile endeavor at all,
> or is it just academic nitpicking?

I'm not sure, either. It strikes me that, in general, it's hard to
avoid a little bit of work_mem overrun, since we can't know whether
the next tuple will fit until we've read it, and then if it turns out
to be big, well, the best thing we can do is free it, but perhaps
that's closing the barn door after the horse has gotten out already.

Having recently spent quite a bit of time looking at tuplesort.c as a
result of Peter Geoghegan's work and some other concerns, I'm inclined
to think that it needs more than minor surgery. That file is peppered
with numerous references to Knuth which serve the dual function of
impressing the reader with the idea that the code must be very good
(since Knuth is a smart guy) and rendering it almost completely
impenetrable (since the design is justified by reference to a textbook
most of us probably do not have copies of).

A quick Google search for external sorting algorithms suggest that the
typical way of doing an external sort is to read data until you fill
your in-memory buffer, quicksort it, and dump it out as a run. Repeat
until end-of-data; then, merge the runs (either in a single pass, or
if there are too many, in multiple passes). I'm not sure whether that
would be better than what we're doing now, but there seem to be enough
other people doing it that we might want to try it out. Our current
algorithm is to build a heap, bounded in size by work_mem, and dribble
tuples in and out, but maintaining that heap is pretty expensive;
there's a reason people use quicksort rather than heapsort for
in-memory sorting. As a desirable side effect, I think it would mean
that we could dispense with retail palloc and pfree altogether. We
could just allocate a big chunk of memory, copy tuples into it until
it's full, using a pointer to keep track of the next unused byte, and
then, after writing the run, reset the allocation pointer back to the
beginning of the buffer. That would not only avoid the cost of going
through palloc/pfree, but also the memory overhead imposed by
bookkeeping and power-of-two rounding.

If we do want to stick with the current algorithm, there seem to be
some newer techniques for cutting down on the heap maintenance
overhead. Heikki's been investigating that a bit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-27 03:35:23 Re: FDW system columns
Previous Message Robert Haas 2012-02-27 02:50:04 Re: Runtime SHAREDIR for testing CREATE EXTENSION