Quick Links

Spilling hashed SetOps and aggregates to disk

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Spilling hashed SetOps and aggregates to disk
Date:	2018-06-04 08:32:47
Message-ID:	87be3bd5-6b13-d76e-5618-6db0a4db584d@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Hash Aggs and SetOps are currently not spilled to disk. If the planner's
estimate on the number of entries is badly off, you might run out of
memory at execution time, if all the entries don't fit in memory.

For HashAggs, this was discussed in depth a couple of years ago at [1].
SetOps have the same issue, but fixing that is simpler, as you don't
need to handle arbitrary aggregate transition values and functions.

So a while back, I started hacking on spilling SetOps, with the idea
that the code to deal with that could later be reused to deal with
HashAggs, too. I didn't get very far, but I'm posting this in this very
unfinished form to show what I've got, because I had a chat on this with
Jeff Davis and some others last week.

The logtape.c interface would be very useful for this. When you start
spilling, you want to create many spill files, so that when reloaded,
each file will fit comfortably in memory. With logtape.c, you can have
many logical tapes, without the overhead of real files. Furthermore, if
you need to re-spill because you a spill file grows too big in the first
pass, logtape.c allows reusing the space "on-the-fly". The only problem
with the current logtape interface is that it requires specifying the
number of "tapes" upfront, when the tapeset is created. However, I was
planning to change that, anyway [2].

[1]
https://www.postgresql.org/message-id/1407706010.6623.16.camel%40jeff-desktop

[2]
https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi

- Heikki

Attachment	Content-Type	Size
0001-Optimize-memory-usage-in-SetOp-executor-node.patch	text/x-patch	5.2 KB
0002-Allow-SetOps-to-spill.patch	text/x-patch	25.2 KB

Responses

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-04 18:52:05 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2018-06-04 09:17:03	Re: plans for PostgreSQL 12
Previous Message	Vik Fearing	2018-06-04 07:59:11	Re: plans for PostgreSQL 12