Re: finding medians

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Hannu Krosing" <hannu(at)tm(dot)ee>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <josh(at)agliodbs(dot)com>
Subject: Re: finding medians
Date: 2002-05-30 22:31:45
Message-ID: D90A5A6C612A39408103E6ECDD77B82920CEE2@voyager.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Hannu Krosing [mailto:hannu(at)tm(dot)ee]
> Sent: Thursday, May 30, 2002 1:17 PM
> To: Josh Burdick
> Cc: pgsql-hackers(at)postgresql(dot)org; josh(at)agliodbs(dot)com
> Subject: Re: [HACKERS] finding medians
>
>
> On Fri, 2002-05-31 at 01:16, Josh Burdick wrote:
> > BUG: this isn't properly set up to deal with multiple users.
> > For example, if A computes a median, then B could read the data
> > from the median_tmp table. Possibly you could fiddle with
> > transaction isolation levels, or add a user field to median_tmp,
> > or something else complicated, to prevent this, but for now I'm
> > not worrying about this.
>
> You could just use temp tables and indexes - they are local to
> connection .
>
> create TEMP sequence median_id;
> create TEMP table median_tmp (
> median_id int,
> x float4
> );

Another pure SQL solution would be to perform an order by on the column
of interest.

Use a cursor to seek to the middle element. If the data set is odd,
then the median is the center element. If the data set is even, the
median is the average of the two center elements.

A SQL function would be pretty easy. Of course, it is not the most
efficient way to do it. A nice thing about having the data ordered is
that you can then extract the statistic for any kth partition. Hence,
you could generate a function quantile() and call quantile (0.5) to get
the median. If you have a query:
select quantile(.1, col), quantile(.2, col), quantile(.3,col), ...
quantile(.9, col) from sometable
you only have to do the sort once and then operate on the sorted data.
For queries like that, sorting is probably just fine, since the
selection algorithm is only approximately linear for each single
instance.

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2002-05-30 22:34:39 Re: Small changes to facilitate Win32 port
Previous Message Tom Lane 2002-05-30 22:25:03 Re: Small changes to facilitate Win32 port