Quick Links

Re: wip: functions median and percentile

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: wip: functions median and percentile
Date:	2010-10-11 06:58:53
Message-ID:	AANLkTi=wB_K0JYytVQC8g5As=xxzQAuKEfqhQcTXHaFT@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-rrreviewers

2010/10/10 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> In the meantime, the attached variation of the patch fixes the temp
>> file issue and will support all 3 cases. It gives OK performance for
>> (1) and (2), and poor performance for (3). That could be viewed as a
>> future development task, which perhaps the window function API would
>> help with. I think it would be a shame to drop support for (2) just
>> because we can't do (3) efficiently yet.
>
> I started looking at this patch, and noticed that we got so caught up
> in implementation issues that we forgot the unresolved problem of data
> types. The original patch defined median(anyelement), which is only
> well-defined for an odd number of inputs; for an even number of inputs
> you have to take the left or right item in the central pair. I see
> that this version defines
> median(float8) returns float8
> median(float4) returns float8
> median(numeric) returns numeric
> median(int8) returns numeric
> median(int4) returns numeric
> median(int2) returns numeric
> which strikes me as possibly being overkill.
>
> It was pointed out upthread that while median isn't presently
> in the standard, Oracle defines it in terms of percentile_cont(0.5)
> which *is* in the standard. What I read in SQL:2008 is that
> percentile_cont is defined for all numeric types (returning
> approximate numeric with implementation-defined precision),
> and for interval (returning interval), and not for any other
> input type. So it appears to me that what we ought to support
> is
> median(float8) returns float8
> median(interval) returns interval
> and nothing else --- we can rely on implicit casting to convert
> any other numeric input type to float8.

>
> BTW, as far as the implementation issues go, telling tuplesort that it
> can use gigabytes of memory no matter what seems quite unacceptable.
> Put this thing into a hash aggregation and you'll blow out your memory
> in no time. I don't think it's even a good idea to use work_mem there.
> I wonder whether it'd be a good idea to augment AggCheckCallContext()
> so that there's a way for aggregates to find out how much memory they
> ought to try to use. In a simple aggregation situation it's probably
> OK to use work_mem, but in a hash aggregation you'd better use less
> --- perhaps work_mem divided by the number of groups expected.
>
> Also, I believe that the lack-of-cleanup problem for tuplesorts spilling
> to disk should be fixable by using an exprcontext shutdown callback (see
> RegisterExprContextCallback).

this issue is only for window aggregates, and this way is block - the
problem with tuplestore is not only wit cleanup, but more in
performance. But using a MemoryContext shutdown callback is good idea.

Regards

Pavel Stehule

>
> Comments?
>
> regards, tom lane
>

In response to

Re: wip: functions median and percentile at 2010-10-10 21:16:59 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dimitri Fontaine	2010-10-11 07:20:15	Re: Debugging initdb breakage
Previous Message	David Fetter	2010-10-11 06:46:15	Support for JDBC setQueryTimeout, et al.

Browse pgsql-rrreviewers by date

	From	Date	Subject
Next Message	Dean Rasheed	2010-10-11 09:35:13	Re: wip: functions median and percentile
Previous Message	Tom Lane	2010-10-10 21:16:59	Re: wip: functions median and percentile