Re: wip: functions median and percentile

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-14 04:53:57
Message-ID: AANLkTik=uxdBr2NRf1iL5R-0jLJzafRY3pDvke1txLAU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

2010/10/14 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Oct 13, 2010 at 6:56 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2010/10/13 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>> 2010/10/13 Peter Eisentraut <peter_e(at)gmx(dot)net>:
>>>> On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote:
>>>>> The problem is in interface. The original patch did it, but I removed
>>>>> it. We cannot to unsure immutability of some parameters now.
>>>>
>>>> How about you store the "immutable" parameter in the transition state
>>>> and error out if it changes between calls?
>>>>
>>>
>>> yes, it's possibility. Now I looking there and I see the as more
>>> important problem the conformance with ANSI SQL. see my last post.
>>> There can be a kind of aggregate functions based on tuplesort.
>>
>> more - all these functions needs to solve same problem with planner
>> and hash agg. So maybe is time to add a flag ISTUPLESORTED to pg_proc
>> and add solve these functions together.
>
> I think that the design of this patch is still sufficiently up in the
> air that it is not going to be practical to get it committed during
> the current CommitFest, which is nearly over, so I am going to mark it
> as Returned with Feedback.  I suggest that we continue discussing it,
> however, so that we can get things squared away for the next
> CommitFest.  It seems that the fundamental question here is whether
> median is an instance of some more general problem, or whether it's a
> special case; and more importantly, if it is an instance of a more
> general problem, what is the shape of that general problem?

+1

Median implemented as special case of some special sort of functions
will be better. The use case of ANSI SQL functions are more general -
but it needs discussion about design. I didn't find too much
consistency in standard. These functions are defined individually -
not as some special kind of functions. All functions from standard has
a immutable parameters - but Oracle listagg function has one parameter
mutable and second immutable.

More we should better to solve using these functions together with
window clause. I didn't find any note about using combination in
standard, but minimally Oracle allows a within_group and over clauses
together.

On second hand - this work can be really useful. We can get a bigger
conformity with ANSI SQL 200x and with other db - DB2, Oracle, MSSQL,
Sybase support this feature.

>
> Or to put it more bluntly - what is the "problem with planner and hash
> agg" that all of these functions need to solve?  And why does it need
> a flag in pg_proc?  Why can't't we leave it to the individual
> functions to perform a sort of one is needed?

These functions are hungry - It takes a 30 kb (minimum tuplesort) per
group. More there is relative general pattern, that can be shared -
there can be minimaly 6 functions, that just fill tuplesort in
iterations - so these code can be shared, tuplesort can be reseted and
used respectively. And it's question if requested sort can be used in
outer optimalizations. Primary thing for solving is memory usage.

Regards

Pavel Stehule

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-10-14 05:11:31 Bug in writeTimeLineHistory
Previous Message Pavel Stehule 2010-10-14 04:25:45 Re: string function - "format" function proposal

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message Robert Haas 2010-10-14 12:17:54 Re: wip: functions median and percentile
Previous Message Greg Stark 2010-10-14 02:37:37 Re: wip: functions median and percentile