Re: Memory usage during sorting

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-04-13 13:51:27
Message-ID: 22449.1334325087@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote:
>> Yeah, that was me, and it came out of actual user complaints ten or more
>> years back. (It's actually not 2X growth but more like 4X growth
>> according to the comments in logtape.c, though I no longer remember the
>> exact reasons why.) We knew when we put in the logtape logic that we
>> were trading off speed for space, and we accepted that.

> I skimmed through TAOCP, and I didn't find the 4X number you are
> referring to, and I can't think what would cause that, either. The exact
> wording in the comment in logtape.c is "4X the actual data volume", so
> maybe that's just referring to per-tuple overhead?

My recollection is that that was an empirical measurement using the
previous generation of code. It's got nothing to do with per-tuple
overhead IIRC, but with the fact that the same tuple can be sitting on
multiple "tapes" during a polyphase merge, because some of the tapes can
be lying fallow waiting for future use --- but data on them is still
taking up space, if you do nothing to recycle it. The argument in the
comment shows why 2X is the minimum space growth for a plain merge
algorithm, but that's only a minimum.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-13 14:17:48 Re: Parameterized-path cost comparisons need some work
Previous Message Peter Eisentraut 2012-04-13 12:55:48 Re: Last gasp