Skip site navigation (1) Skip section navigation (2)

Re: wip: functions median and percentile

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wip: functions median and percentile
Date: 2010-10-11 14:03:40
Message-ID: 22629.1286805820@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-rrreviewers
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 10 October 2010 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, as far as the implementation issues go, telling tuplesort that it
>> can use gigabytes of memory no matter what seems quite unacceptable.
>> Put this thing into a hash aggregation and you'll blow out your memory
>> in no time. I don't think it's even a good idea to use work_mem there.

> Argh! Yes that sounds like a much more serious problem.

> Interestingly I couldn't seem to produce this effect. Every effort I
> make to write a query to test this with median ends up being executed
> using a GroupAggregate, while the equivalent query with avg uses a
> HashAggregate. I don't understand why they are being treated
> differently.

If you're using recent sources, there's some code in count_agg_clauses()
that assumes that an aggregate with transtype INTERNAL will use
ALLOCSET_DEFAULT_INITSIZE (ie 8K) workspace.  So that'll discourage the
planner from selecting HashAggregate except for a pretty small number of
groups.  The problem is that there's still a whole lot of daylight
between 8K and 2G, so plenty of room to go wrong.

The other approach that we could take here is to replace the
ALLOCSET_DEFAULT_INITSIZE hack (which is certainly no more than a hack)
with some way for an aggregate to declare how much space it'll eat,
or more simply to mark it as "never use in HashAgg".  This was discussed
earlier but it would require a significant amount of dogwork and no one
was real excited about doing it.  Maybe it's time to bite that bullet
though.  Reflecting on it, I think it'd be best to allow an agg to
provide an estimation function that'd be told the input data type and
expected number of rows --- even on a per-aggregate basis, a constant
estimate just isn't good enough.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2010-10-11 14:08:53
Subject: Re: wip: functions median and percentile
Previous:From: Josh KupershmidtDate: 2010-10-11 13:41:35
Subject: Re: column-level update privs + lock table

pgsql-rrreviewers by date

Next:From: Tom LaneDate: 2010-10-11 14:08:53
Subject: Re: wip: functions median and percentile
Previous:From: Robert HaasDate: 2010-10-11 10:33:12
Subject: Re: wip: functions median and percentile

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group