Skip site navigation (1) Skip section navigation (2)

Re: wip: functions median and percentile

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-04 19:20:28
Message-ID: AANLkTi=T=8zAQzt88uBb0x7QXkb83OceGMvrh+pL796f@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-rrreviewers
On 4 October 2010 18:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Oct 4, 2010 at 2:58 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> That requires a new sort for each row. I generated this with a minor
>> tweak to Pavel's patch to just restart the tuplesort each time (the
>> "quick-fix" solution). The problem is that performance really sucks,
>> because it is an O(n^2 log(n)) algorithm.
>
> Maybe that's OK.  If you're doing repeated median operations on large
> data sets, perhaps you should expect that to be slow.  I bet that
> people who want to use this as a window function will want one median
> per group, not n medians per group; and it doesn't seem like a good
> idea to say - we're not going to let you use this as a window function
> AT ALL because you might decide to do something that will be really
> slow.  You can always hit ^C if you get tired of waiting.  This seems
> like it's very far from being the most important thing for us to
> optimize, though of course it's great if we can.
>

Well FWIW, here's the quick-fix solution to make this median function
support window queries.

It's OK if all you want is one median per partition, but otherwise
it's pretty ugly. It will copy the entire tuplesort for each row in a
running median, which is horrible, but that's actually negligible
compared to the subsequent sort that's going to happen for that row.

I'd actually be tempted to force the tuplesort to stay in memory for a
running median, on the grounds that performance will be so bad for a
large partition that you'd run out of patience waiting for it long
before it would run out of memory :-(


Regards,
Dean


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

Attachment: median2.diff
Description: application/octet-stream (20.2 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Dan PortsDate: 2010-10-04 19:22:32
Subject: Re: MIT benchmarks pgsql multicore (up to 48)performance
Previous:From: Tomoaki SatoDate: 2010-10-04 19:18:04
Subject: Re: pgadmin3_90 package

pgsql-rrreviewers by date

Next:From: Marko TiikkajaDate: 2010-10-04 21:59:07
Subject: Re: [HACKERS] top-level DML under CTEs
Previous:From: Robert HaasDate: 2010-10-04 17:22:39
Subject: Re: wip: functions median and percentile

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group