Re: wip: functions median and percentile

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-01 14:44:44
Message-ID: AANLkTinejbxAwrAsHxVWCOs0pyFdm-s5AXpJZ-GuWwY7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

On Fri, Oct 1, 2010 at 10:11 AM, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:
> 2010/10/1 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>>> 2010/9/26 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>>> This patch needs a few work - can share a compare functionality with
>>>> tuplesort.c, but I would to verify a concept now.
>>
>>> Sorry for delay. I read the patch and it seems the result is sane. For
>>> window function calls, I agree that the current tuplesort is not
>>> enough to implement median functions and the patch introduces its own
>>> memsort mechanism, although memsort has too much copied from
>>> tuplesort. It looks to me not so difficult to modify the existing
>>> tuplesort to guarantee staying in memory always if an option to do so
>>> is specified from caller. I think that option can be used by other
>>> cases in the core code.
>>
>> If this patch tries to force the entire sort to happen in memory,
>> it is not committable.  What will happen when you get a lot of data?
>> You need to be working on a variant that will work anyway, not working
>> on an unacceptable lobotomization of the main sort code.
>
> What about array_agg()? Doesn't it exceed memory even if the huge data come in?

So, if you have 512MB of RAM in the box and you build and return a 1GB
array, it's going to be a problem. Period, full stop. The interim
memory consumption cannot be less than the size of the final result.

If you have 512MB of RAM in the box and you want to aggregate 1GB of
data and return a 4 byte integer, it's only a problem if your
implementation is bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-01 14:47:11 Re: I: About "Our CLUSTER implementation is pessimal" patch
Previous Message Tom Lane 2010-10-01 14:43:47 Re: wip: functions median and percentile

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message Hitoshi Harada 2010-10-01 15:00:35 Re: wip: functions median and percentile
Previous Message Tom Lane 2010-10-01 14:43:47 Re: wip: functions median and percentile