Re: Odd out of memory problem.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Odd out of memory problem.
Date: 2012-03-27 00:11:41
Message-ID: 20576.1332807101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Could you give us a brain dump on the sketch? I've never seen how to
>> do it without unreasonable overhead.

> Hm. So my original plan was dependent on adding the state-merge
> function we've talked about in the past. Not all aggregate functions
> necessarily can support such a function but I think all or nearly all
> the builtin aggregates can. Certainly min,max, count, sum, avg,
> stddev, array_agg can which are most of what people do. That would be
> a function which can take two state variables and produce a new state
> variable.

I'd rather not invent new requirements for aggregate implementations
if we can avoid it.

> However now that I've started thinking about it further I think you
> could solve it with less complexity by cheating in various ways. For
> example if you limit the hash size to 1/2 of work_mem then you when
> you reach that limit you could just stuff any tuple that doesn't match
> a hash entry into a tuplesort with 1/2 of work_mem and do the regular
> level break logic on the output of that.

Or just start dumping such tuples into a tuplestore, while continuing to
process tuples that match the hashagg entries that are already in
existence. Once the input is exhausted, read out the hashagg entries we
have, flush the hashagg table, start reading from the tuplestore.
Repeat as needed.

I like this idea because the only thing you give up is predictability of
the order of output of aggregated entries, which is something that a
hashagg isn't guaranteeing anyway. In particular, we would still have a
guarantee that any one aggregate evaluation processes the matching
tuples in arrival order, which is critical for some aggregates.

The main problem I can see is that if we start to flush after work_mem
is X% full, we're essentially hoping that the state values for the
existing aggregates won't grow by more than 1-X%, which is safe for many
common aggregates but fails for some like array_agg(). Ultimately, for
ones like that, it'd probably be best to never consider hashing at all.
I guess we could invent an "unsafe for hash aggregation" flag for
aggregates that have unbounded state-size requirements.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-03-27 00:19:21 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
Previous Message Tom Lane 2012-03-26 23:57:25 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)