|From:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|To:||pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>|
|Subject:||Spilling hashed SetOps and aggregates to disk|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Hash Aggs and SetOps are currently not spilled to disk. If the planner's
estimate on the number of entries is badly off, you might run out of
memory at execution time, if all the entries don't fit in memory.
For HashAggs, this was discussed in depth a couple of years ago at .
SetOps have the same issue, but fixing that is simpler, as you don't
need to handle arbitrary aggregate transition values and functions.
So a while back, I started hacking on spilling SetOps, with the idea
that the code to deal with that could later be reused to deal with
HashAggs, too. I didn't get very far, but I'm posting this in this very
unfinished form to show what I've got, because I had a chat on this with
Jeff Davis and some others last week.
The logtape.c interface would be very useful for this. When you start
spilling, you want to create many spill files, so that when reloaded,
each file will fit comfortably in memory. With logtape.c, you can have
many logical tapes, without the overhead of real files. Furthermore, if
you need to re-spill because you a spill file grows too big in the first
pass, logtape.c allows reusing the space "on-the-fly". The only problem
with the current logtape interface is that it requires specifying the
number of "tapes" upfront, when the tapeset is created. However, I was
planning to change that, anyway .
|Next Message||Pavel Stehule||2018-06-04 09:17:03||Re: plans for PostgreSQL 12|
|Previous Message||Vik Fearing||2018-06-04 07:59:11||Re: plans for PostgreSQL 12|