Re: wip: functions median and percentile

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-04 19:20:28
Message-ID: AANLkTi=T=8zAQzt88uBb0x7QXkb83OceGMvrh+pL796f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

On 4 October 2010 18:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Oct 4, 2010 at 2:58 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> That requires a new sort for each row. I generated this with a minor
>> tweak to Pavel's patch to just restart the tuplesort each time (the
>> "quick-fix" solution). The problem is that performance really sucks,
>> because it is an O(n^2 log(n)) algorithm.
>
> Maybe that's OK.  If you're doing repeated median operations on large
> data sets, perhaps you should expect that to be slow.  I bet that
> people who want to use this as a window function will want one median
> per group, not n medians per group; and it doesn't seem like a good
> idea to say - we're not going to let you use this as a window function
> AT ALL because you might decide to do something that will be really
> slow.  You can always hit ^C if you get tired of waiting.  This seems
> like it's very far from being the most important thing for us to
> optimize, though of course it's great if we can.
>

Well FWIW, here's the quick-fix solution to make this median function
support window queries.

It's OK if all you want is one median per partition, but otherwise
it's pretty ugly. It will copy the entire tuplesort for each row in a
running median, which is horrible, but that's actually negligible
compared to the subsequent sort that's going to happen for that row.

I'd actually be tempted to force the tuplesort to stay in memory for a
running median, on the grounds that performance will be so bad for a
large partition that you'd run out of patience waiting for it long
before it would run out of memory :-(

Regards,
Dean

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

Attachment Content-Type Size
median2.diff application/octet-stream 20.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Ports 2010-10-04 19:22:32 Re: MIT benchmarks pgsql multicore (up to 48)performance
Previous Message Tomoaki Sato 2010-10-04 19:18:04 Re: pgadmin3_90 package

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message Marko Tiikkaja 2010-10-04 21:59:07 Re: [HACKERS] top-level DML under CTEs
Previous Message Robert Haas 2010-10-04 17:22:39 Re: wip: functions median and percentile