Re: Aggregate-function space leakage

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Chris Spotts <rfusca(at)gmail(dot)com>
Subject: Re: Aggregate-function space leakage
Date: 2009-07-23 12:56:21
Message-ID: e08cc0400907230556x39aea0eci98499dfe24b5e209@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/7/23 Greg Stark <gsstark(at)mit(dot)edu>:
> On Wed, Jul 22, 2009 at 10:14 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The reason for that turns out to be that we deliberately lobotomized
>> array_agg that way, just last month:
>> http://archives.postgresql.org/pgsql-committers/2009-06/msg00259.php
>> in response to this problem:
>> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01186.php
>>
>> We need a better idea.
>
>
> Rereading your diagnosis of Merlin Moncure's original problem I'm a
> bit puzzled. Why do we have to rerun the final function when we rescan
> the hash table? Surely the logical thing to do is to store the final
> value in the hash table with some flag saying that value has been
> finalized rather than to reexecute the final function every time it's
> rescanned.
>
> I'm not sure that really solves anything though since there's no
> guarantee that the first scan was finished when it's reset so there
> could still be unfinalized elements in the hash table. Would it be too
> costly to finalize all the hash elements in a single pass before
> returning any?
>

It looks like Agg node builds whole of the hash table before returning
a tuple in hash-mode. If it stores all the results somewhere and just
return them on rescan, an issue is volatile final functions (and I
know it's so rare case), but except for that it sounds sane, though I
don't know exact requirement of rescaning and reexecuting in the
Exectuor. If it really needs to release anything when rescan, this
approach also fails.

So two ideas from Tom seem to me a little worse than that. Modifying
Agg.c might add overhead to reset context group by group and forcing
array_agg() (i.e. user aggregates) to distinguish hash-mode and
group-mode is definitely heavy for users. The real problem here is
how/when to release transvalue stored by aggregates in new method
introduced in 8.4 with array_agg(), which is to pass pointers by
transfunc's arguments. Maybe array_agg should not do that thing
introduced in 8.4. We may go back to array_accum() possibly.

Regards,

> --
> greg
> http://mit.edu/~gsstark/resume.pdf
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2009-07-23 12:56:55 Re: Split-up ECPG patches
Previous Message Dimitri Fontaine 2009-07-23 12:50:17 Re: Extensions User Design