Quick Links

Re: wip: functions median and percentile

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: wip: functions median and percentile
Date:	2010-10-11 15:30:46
Message-ID:	AANLkTik0bVJoJiMoaUgScAs-kKP=iLQ9uE_h75_FoXBc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-rrreviewers

On 11 October 2010 15:03, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> On 10 October 2010 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> BTW, as far as the implementation issues go, telling tuplesort that it
>>> can use gigabytes of memory no matter what seems quite unacceptable.
>>> Put this thing into a hash aggregation and you'll blow out your memory
>>> in no time. I don't think it's even a good idea to use work_mem there.
>
>> Argh! Yes that sounds like a much more serious problem.
>
>> Interestingly I couldn't seem to produce this effect. Every effort I
>> make to write a query to test this with median ends up being executed
>> using a GroupAggregate, while the equivalent query with avg uses a
>> HashAggregate. I don't understand why they are being treated
>> differently.
>
> If you're using recent sources, there's some code in count_agg_clauses()
> that assumes that an aggregate with transtype INTERNAL will use
> ALLOCSET_DEFAULT_INITSIZE (ie 8K) workspace. So that'll discourage the
> planner from selecting HashAggregate except for a pretty small number of
> groups. The problem is that there's still a whole lot of daylight
> between 8K and 2G, so plenty of room to go wrong.
>
> The other approach that we could take here is to replace the
> ALLOCSET_DEFAULT_INITSIZE hack (which is certainly no more than a hack)
> with some way for an aggregate to declare how much space it'll eat,
> or more simply to mark it as "never use in HashAgg". This was discussed
> earlier but it would require a significant amount of dogwork and no one
> was real excited about doing it. Maybe it's time to bite that bullet
> though. Reflecting on it, I think it'd be best to allow an agg to
> provide an estimation function that'd be told the input data type and
> expected number of rows --- even on a per-aggregate basis, a constant
> estimate just isn't good enough.

How good will that estimate of the number of rows be though? If
they're coming from a SRF it could be a huge under-estimate, and you'd
still risk eating all the memory, if you allowed a hash aggregate.

Regards,
Dean

>
> regards, tom lane
>

In response to

Re: wip: functions median and percentile at 2010-10-11 14:03:40 from Tom Lane

Responses

Re: wip: functions median and percentile at 2010-10-11 15:44:08 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Leonardo Francalanci	2010-10-11 15:30:54	Re: On the usefulness of hint bits
Previous Message	Robert Haas	2010-10-11 15:20:41	Re: On the usefulness of hint bits

Browse pgsql-rrreviewers by date

	From	Date	Subject
Next Message	Tom Lane	2010-10-11 15:44:08	Re: wip: functions median and percentile
Previous Message	Robert Haas	2010-10-11 14:29:38	Re: wip: functions median and percentile