Re: wip: functions median and percentile

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-05 12:14:24
Message-ID: AANLkTinSbwJ9TJ_uOHR6VsQOg2yemd1cTstwqVB8096K@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

2010/10/5 Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>:
> On 5 October 2010 07:04, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:
> Extrapolating from few quick timing tests, even in the best case, on
> my machine, it would take 7 days for the running median to use up
> 100MB, and 8 years for it to use 2GB. So setting the tuplesort's
> workMem to 2GB (only in the running median case) would actually be
> safe in practice, and would prevent the temp file leak (for a few
> years at least!). I feel dirty even suggesting that. Better ideas
> anyone?

So, I suggested to implement median as a *pure* window function aside
from Pavel's aggregate function, and Greg suggested insertion
capability of tuplesort. By this approach, we keep tuplesort to hold
all the values in the current frame and can release it on the last of
a partition (it's possible by window function API.) This is
incremental addition of values and is far better than O(n^2 log(n))
although I didn't estimate the order. Only when the frame head is
moving down, we should re-initialize tuplesort and it is as slow as
calling aggregate version per each row (but I think we can solve it
somehow if looking precisely).

Regards,

--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-10-05 12:33:29 Re: standby registration (was: is sync rep stalled?)
Previous Message Peter Eisentraut 2010-10-05 12:11:43 Re: O_DSYNC broken on MacOS X?

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message Dean Rasheed 2010-10-05 13:08:59 Re: wip: functions median and percentile
Previous Message Dean Rasheed 2010-10-05 08:47:52 Re: wip: functions median and percentile