Re: wip: functions median and percentile

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-13 10:36:59
Message-ID: AANLkTi=UH-KAkEqMPCe4oRnuptEYyp7uVSnjzw+nFUNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

Hello

I am looking on SQL standard for some info about "within group"
clause. This clause is necessary for functions:

rank, dense_rank, cume_dist, percent_rank and percentile_disc and
persentile_cont. These functions needs a clause "WITHIN GROUP".

If I understand, then these functions are not simple aggregates - its
some between window functions and aggregates.

Questions:

* is clause "WITHIN GROUP" just syntactic sugar for our aggregate with
ORDER BY? I am thinking so not. There are predefined set of functions
that can be used with this clause.

* what is correct implementation of these functions? When I am
looking on parameters, these functions are very similar to window
functions. So there are two two ways for implementation. Implement it
as special case of window functions or implement it as special case of
aggregates.

Regards

Pavel Stehule

2010/10/12 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
> 2010/10/12 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> Hello
>>
>> 2010/10/11 Greg Stark <gsstark(at)mit(dot)edu>:
>>> On Sun, Oct 10, 2010 at 2:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> It was pointed out upthread that while median isn't presently
>>>> in the standard, Oracle defines it in terms of percentile_cont(0.5)
>>>> which *is* in the standard.
>>>
>>> Uhmm, then why don't we implement that? We could provide median() as a
>>> short-cut but percentile_cont() doesn't sound much harder to implement
>>> than median() and more general.
>>
>> The problem is in interface. The original patch did it, but I removed
>> it. We cannot to unsure immutability of some parameters now. Can we
>> enhance a AGGREGATE to allow some mark like IMMUTABLE parameter and
>> probably we should to support ANSI syntax:
>>
>> PERCENTILE_CONT ( expression1 )
>> WITHIN GROUP ( ORDER BY expression2 [ ASC | DESC ] )
>>
>> This syntax allows to divide a muttable and immutable parameters.
>
> If this is only a syntax sugar for mutable/immutable parameter, then I
> guess it's time to take it serious to implement in our syntax,
> although I'm not sure if it affects more execution model than
> interface.
>
> Regards,
>
>
>
> --
> Hitoshi Harada
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2010-10-13 10:41:52 Re: Slow count(*) again...
Previous Message Neil Whelchel 2010-10-13 10:16:11 Re: Slow count(*) again...

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message Peter Eisentraut 2010-10-13 10:48:17 Re: wip: functions median and percentile
Previous Message Pavel Stehule 2010-10-12 08:44:20 Re: wip: functions median and percentile