Re: Memory usage during sorting

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-02-08 09:08:21
Message-ID: CAP7Qgmm6egC9LiD78aUtR7HOy=_6EOVdX2rjtobLLLCTw2JpbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> The worst thing about the current memory usage is probably that big
> machines can't qsort more than 16,777,216 tuples no matter how much
> memory they have, because memtuples has to live entirely within a
> single allocation, which is currently limited to 1GB.  It is probably
> too late to fix this problem for 9.2. I wish I had started there
> rather than here.
>
> This 16,777,216 tuple limitation will get even more unfortunate if the
> specializations that speed up qsort but not external sort get
> accepted.
>

I think it's a fair ask to extend our palloc limitation of 1GB to
64bit space. I see there are a lot of applications that want more
memory by one palloc call in user-defined functions, aggregates, etc.
As you may notice, however, the area in postgres to accomplish it
needs to be investigated deeply. I don't know where it's safe to allow
it and where not. varlena types is unsafe, but it might be possible to
extend varlena header to 64 bit in major release somehow.

> Attached is a completely uncommitable proof of concept/work in
> progress patch to get around the limitation.  It shows a 2 fold
> improvement when indexing an integer column on a 50,000,000 row
> randomly ordered table.

In any case, we do need bird-eye sketch to attack it but I guess it's
worth and at some future point we definitely must do, though I don't
know if it's the next release or third next release from now.

Thanks,
--
Hitoshi Harada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Benedikt Grundmann 2012-02-08 09:28:49 Re: random_page_cost vs seq_page_cost
Previous Message Hitoshi Harada 2012-02-08 09:01:17 Re: Memory usage during sorting