Re: Memory usage during sorting

From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-03-20 20:26:31
Message-ID: 4F68E7F7.6080004@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/18/12 10:25 AM, Tom Lane wrote:
> Jeff Janes<jeff(dot)janes(at)gmail(dot)com> writes:
>> > On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> >> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes<jeff(dot)janes(at)gmail(dot)com> wrote:
>>>> >>> Anyway, I think the logtape could use redoing.
>> > The problem there is that none of the files can be deleted until it
>> > was entirely read, so you end up with all the data on disk twice. I
>> > don't know how often people run their databases so close to the edge
>> > on disk space that this matters, but someone felt that that extra
>> > storage was worth avoiding.
> Yeah, that was me, and it came out of actual user complaints ten or more
> years back. (It's actually not 2X growth but more like 4X growth
> according to the comments in logtape.c, though I no longer remember the
> exact reasons why.) We knew when we put in the logtape logic that we
> were trading off speed for space, and we accepted that. It's possible
> that with the growth of hard drive sizes, real-world applications would
> no longer care that much about whether the space required to sort is 4X
> data size rather than 1X. Or then again, maybe their data has grown
> just as fast and they still care.
>

I believe the case of tape sorts that fit entirely in filesystem cache is a big one as well... doubling or worse the amount of data that needed to live "on disk" at once would likely suck in that case.

Also, it's not uncommon to be IO-bound on a database server... so even if we're not worried about storing everything 2 or more times from a disk space standpoint, we should be concerned about the IO bandwidth.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-03-20 21:06:45 Re: Memory usage during sorting
Previous Message Alvaro Herrera 2012-03-20 20:16:17 Re: Error trying to compile a simple C trigger