From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: wip: functions median and percentile |
Date: | 2010-10-11 17:34:51 |
Message-ID: | AANLkTimOirWayuAkF=NvbMQ+4TpxrEA1LEMNZxxm5+k3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-rrreviewers |
On 11 October 2010 16:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> On 11 October 2010 15:03, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Reflecting on it, I think it'd be best to allow an agg to
>>> provide an estimation function that'd be told the input data type and
>>> expected number of rows --- even on a per-aggregate basis, a constant
>>> estimate just isn't good enough.
>
>> How good will that estimate of the number of rows be though?
>
> It can't possibly be any worse than a hard-wired constant ;-)
>
>> If they're coming from a SRF it could be a huge under-estimate, and you'd
>> still risk eating all the memory, if you allowed a hash aggregate.
>
> If, for a particular aggregate, you're too chicken to ever allow hash
> aggregation, you could just return a very large number from the
> estimation hook function. I doubt that's a very useful behavior in the
> majority of cases though.
>
Yeah, for median I'd be too chicken :-)
set work_mem = '2MB';
explain analyse select median(i) from generate_series(1,40000) as g(i)
group by i;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=15.00..17.50 rows=200 width=4) (actual
time=229.524..287.153 rows=40000 loops=1)
-> Function Scan on generate_series g (cost=0.00..10.00 rows=1000
width=4) (actual time=8.738..16.595 rows=40000 loops=1)
Total runtime: 811.460 ms
(3 rows)
The estimate of 200 x 8K is below work_mem, so it uses a hash
aggregate. In reality, each tuplesort allocates around 30K initially,
so it very quickly uses over 1GB. A better estimate for the aggregate
wouldn't improve this situation much.
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-11 17:37:11 | Re: wip: functions median and percentile |
Previous Message | Tom Lane | 2010-10-11 17:30:50 | Re: pgxs docdir question |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-11 17:37:11 | Re: wip: functions median and percentile |
Previous Message | Tom Lane | 2010-10-11 15:44:08 | Re: wip: functions median and percentile |