Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

From: David Fetter <david(at)fetter(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Date: 2013-07-18 04:32:12
Message-ID: 20130718043212.GB30509@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 18, 2013 at 03:15:14AM +0000, Andrew Gierth wrote:
> The spec defines two types of aggregate function classed as "ordered set
> function", as follows:
>
> 1. An "inverse distribution function" taking one argument (which must be
> a grouped column or otherwise constant within groups) plus a sorted
> group with exactly one column:
>
> =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ...
>
> The motivating example for this (and the only ones in the spec) are
> percentile_cont and percentile_disc, to return a percentile result
> from a continuous or discrete distribution. (Thus
> percentile_cont(0.5) within group (order by x) is the spec's version
> of a median(x) function.)
>
> 2. A "hypothetical set function" taking N arguments of arbitrary types
> (a la VARIADIC "any", rather than a fixed list) plus a sorted group
> with N columns of matching types:
>
> =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ...
>
> (where typeof(p1)==typeof(q1) and so on, at least up to trivial
> conversions)
>
> The motivating example here is to be able to do rank(p1,p2,...) to
> return the rank that the specified values would have had if they were
> added to the group.
>
> As usual, we do not want to constrain ourselves to supporting only the
> specific cases in the spec, but would prefer a general solution.
>
> We (meaning myself and Atri) have an implementation that basically
> works, though it is not yet complete, but before taking it any further
> we need to resolve the design question of how to represent these two
> types of function in the system catalogs. The fact that there are in
> effect two parts to the parameter list, which are either independent
> (for inverse distribution funcs) or closely related (for hypothetical
> set functions), doesn't seem to point to an obvious way to represent
> this in pg_proc/pg_aggregate.
>
> I'm not yet satisfied with the method used in our implementation,

What is that method?

> so we're throwing this open for suggestions. We will post the
> work-in-progress patch along with a description of its current
> implementation shortly.
>
> One of the major complications is that we ideally want to be able to
> do polymorphism based on the type of the sorted group, specifically
> in order to be able to do
>
> percentile_disc(float8) within group (order by anyelement)
>
> returning anyelement. (i.e. we should be able to get a discrete
> percentile from any type that is orderable.) The question here is
> how to resolve the return type both of the aggregate itself and of
> the finalfn.
>
> We've also had an expression of interest in extending this to allow
> percentile_disc(float8[]) and percentile_cont(float8[]) returning
> arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return
> an array containing the bounds, median and quartiles in one go. This
> is an extension to the spec but it seems sufficiently obviously
> useful to be worth supporting.
>
> Comments?

I'm really happy to see PostgreSQL come into its own when it comes to
the analytics side of the house :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2013-07-18 04:39:37 Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Previous Message Tom Lane 2013-07-18 04:27:21 pgindent behavior we could do without