Re: finding medians

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Burdick <jburdick(at)gradient(dot)cis(dot)upenn(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org, josh(at)agliodbs(dot)com
Subject: Re: finding medians
Date: 2002-05-30 21:45:53
Message-ID: 7896.1022795153@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Burdick <jburdick(at)gradient(dot)cis(dot)upenn(dot)edu> writes:
> illustrates the limitations of the current aggregate function setup,
> which works so nicely for avg() and stddev().
> I don't have any good solutions. I tried using a float4[] to store
> each element as it's added, but I couldn't get array updates working in
> PL/PgSQL, so that didn't help.
> Perhaps aggregate functions could be passed an array? Or a cursor,
> pointing at the first line? I'm not sure.

I don't think that would help. The real problem here is the amount of
internal storage needed. AFAIK there are no exact algorithms for finding
the median that require less than O(N) workspace for N input items.
Your "hack" with a temporary table is not a bad approach if you want to
work for large N.

There are algorithms out there for finding approximate medians using
limited workspace; it might be reasonable to transform one of these into
a Postgres aggregate function.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-05-30 22:06:44 Re: Small changes to facilitate Win32 port
Previous Message Katherine Ward 2002-05-30 21:33:50 Small changes to facilitate Win32 port