Skip site navigation (1) Skip section navigation (2)

Re: Odd out of memory problem.

From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Odd out of memory problem.
Date: 2012-03-26 23:53:10
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
>> I have a sketch for how to handle spilling hash aggregates to disk in
>> my head. I'm not sure if it's worth the amount of complexity it would
>> require but I'll poke around a bit and see if it works out well.
> It'd be awfully nice if those could spill to disk.  I think that
> currently that's the only plan type where a misestimate can lead to
> hard failure rather than just slower-than-you'd-like.  Which is not
> nice considering that the estimates are necessarily just estimates.
> Could you give us a brain dump on the sketch?  I've never seen how to
> do it without unreasonable overhead.

Hm. So my original plan was dependent on adding the state-merge
function we've talked about in the past. Not all aggregate functions
necessarily can support such a function but I think all or nearly all
the builtin aggregates can. Certainly min,max, count, sum, avg,
stddev, array_agg can which are most of what people do. That would be
a function which can take two state variables and produce a new state

If you have this then you can sort and spill the hash table and start
a new hash table and keep going. When you're done you merge the hash
tables using something like heap merge but applying the state merge

However now that I've started thinking about it further I think you
could solve it with less complexity by cheating in various ways. For
example if you limit the hash size to 1/2 of work_mem then you when
you reach that limit you could just stuff any tuple that doesn't match
a hash entry into a tuplesort with 1/2 of work_mem and do the regular
level break logic on the output of that.

Or if you can count on tuplesort to be a stable sort (and I think it
might be already) then you could just spill the sorted hash entries
and then switch to doing a tapesort for the rest. When you finish
merging you can read them back and do level break logic like normal.
If there is a partially computed state it will be the first one in the
equal group. If you can't count on tuplesort to be stable you could
dump the partially computed states to a different tape and do a merge
between it and the sorted output of the main data set.

The holy grail of this kind of merging of the two algorithms would be
something that keeps the hash table going and maintains an lru of hash
entries. When it grows too large it would spill the oldest partial
state. Then it would sort those states and merge them (possibly in a
single step).  That might be too complex to pull its weight given some
of the above sketches are probably simple enough.


In response to


pgsql-hackers by date

Next:From: Robert HaasDate: 2012-03-26 23:53:25
Subject: Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
Previous:From: Tom LaneDate: 2012-03-26 23:39:30
Subject: Re: 9.2 commitfest closure (was Command Triggers, v16)

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group