Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-11 14:12:42
Message-ID: h3a6kq$ekn$1@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general

On 2009-07-10, Richard Neill <rn214(at)cam(dot)ac(dot)uk> wrote:
>
> The following bug has been logged online:
>
> Bug reference: 4916
> Logged by: Richard Neill
> Email address: rn214(at)cam(dot)ac(dot)uk
> PostgreSQL version: 8.4
> Operating system: Linux
> Description: wish: more statistical functions (median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc), it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard

percentile isn't really an agregate function as its value is mainly
determined by two (or fewer) values in the data set.

here's a function that works on arbitrarily named tables,
by building queries to get the needed statistics and then
using order by, offset, limit. Exploiting
an index (if present) to retrtieve the data values.

it won't work (well) on subqueries, but as 'col' and tbl are substitutd in
unescaped you can use joins and expressions if needed (expect a
performance hit if you do!)

-- percentile function:
-- usage example: select percentile('sometable','somecolumn','true', 0.45);
-- 'true' is a where constraint, 0.45 gets the 45th percentile.
--
-- for best preformance have an index that covers the where condition
-- and col.

CREATE OR REPLACE FUNCTION percentile ( tbl text, col text, whr text, frc double precision ) returns double precision as $$
DECLARE
cnt integer; -- count of records
rat double precision; -- ratio
pos double precision; -- postion
res double precision; -- result
BEGIN
EXECUTE 'select count( '|| col ||' ) FROM '|| tbl ||' WHERE '|| whr INTO cnt;
pos=frc*(cnt-1);
rat=mod(pos::numeric,1);
EXECUTE 'SELECT (SELECT '||col||' * '||(1-rat)::text
|| ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
|| 'OFFSET ' || floor(pos)::text || ' LIMIT 1 )'
|| ' + (SELECT '||col||' * '||rat::text
|| ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
|| 'OFFSET ' || ceil(pos)::text || ' LIMIT 1 )' into RES;
RETURN res;
END;
$$ LANGUAGE PLPGSQL;

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Joe Conway 2009-07-12 09:51:39 Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Previous Message Pavel Stehule 2009-07-11 06:34:12 Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

Browse pgsql-general by date

  From Date Subject
Next Message Martin Gainty 2009-07-11 14:31:57 Re: Using Postgres to store genetic data
Previous Message Raymond O'Donnell 2009-07-11 14:03:24 Re: Question