Re: wip: functions median and percentile

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-11 09:35:13
Message-ID: AANLkTikFwmC6h+sV4aGvQaFKSr5OJq0qE9TjHpfT9ncn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-rrreviewers

On 10 October 2010 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It was pointed out upthread that while median isn't presently
> in the standard, Oracle defines it in terms of percentile_cont(0.5)
> which *is* in the standard.  What I read in SQL:2008 is that
> percentile_cont is defined for all numeric types (returning
> approximate numeric with implementation-defined precision),
> and for interval (returning interval), and not for any other
> input type.  So it appears to me that what we ought to support
> is
>        median(float8) returns float8
>        median(interval) returns interval
> and nothing else --- we can rely on implicit casting to convert
> any other numeric input type to float8.
>

Yeah that would be much simpler.

BTW, why has percentile been removed from this patch? As the more
general, and SQL standard function, that would seem to be the more
useful one to include. Upthread it was mentioned that there is already
an ntile window function, but actually that's a completely different
thing.

> BTW, as far as the implementation issues go, telling tuplesort that it
> can use gigabytes of memory no matter what seems quite unacceptable.
> Put this thing into a hash aggregation and you'll blow out your memory
> in no time.  I don't think it's even a good idea to use work_mem there.

Argh! Yes that sounds like a much more serious problem.

Interestingly I couldn't seem to produce this effect. Every effort I
make to write a query to test this with median ends up being executed
using a GroupAggregate, while the equivalent query with avg uses a
HashAggregate. I don't understand why they are being treated
differently.

> I wonder whether it'd be a good idea to augment AggCheckCallContext()
> so that there's a way for aggregates to find out how much memory they
> ought to try to use.  In a simple aggregation situation it's probably
> OK to use work_mem, but in a hash aggregation you'd better use less
> --- perhaps work_mem divided by the number of groups expected.

Wouldn't that risk not allowing any memory at all the to aggregate in
some cases? I don't have a better idea mind you, short of somehow not
allowing hash aggregation for this function.

> Also, I believe that the lack-of-cleanup problem for tuplesorts spilling
> to disk should be fixable by using an exprcontext shutdown callback (see
> RegisterExprContextCallback).

Ah! I wasn't aware of such a callback. Sounds perfect for the job.

Regards,
Dean

>
> Comments?
>
>                        regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-10-11 09:55:16 Re: wip: functions median and percentile
Previous Message Dean Rasheed 2010-10-11 07:54:31 Re: WIP: Triggers on VIEWs

Browse pgsql-rrreviewers by date

  From Date Subject
Next Message Pavel Stehule 2010-10-11 09:55:16 Re: wip: functions median and percentile
Previous Message Pavel Stehule 2010-10-11 06:58:53 Re: wip: functions median and percentile