Re: Spilling hashed SetOps and aggregates to disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-11 15:29:52
Message-ID: 03e73902-e84b-d208-b899-05c791b4ada6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/11/2018 05:16 PM, Robert Haas wrote:
> On Wed, Jun 6, 2018 at 8:16 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> ... and this is pretty much what Jeff Davis suggested, I think. The
>> trouble is, each of those cases behaves nicely/terribly in different
>> corner cases.
>
> That's a really good point. If the number of groups is pretty small
> compared to the number of input tuples, then you really only ever want
> to dump out transition values. By doing so, you minimize the amount
> of data you have to write. But if the number of groups is, say, half
> the number of input tuples, then computing transition values before
> you have all the values that belong to that group is probably a waste
> of time. I wonder if we could decide what to do by comparing the
> number of input tuples consumed to the number of groups created at the
> time we run out of memory. If we've got less than, I dunno, five
> tuples per group, then assume most groups are small. Pick a strategy
> that (mainly) spools input tuples. Otherwise, pick a strategy that
> spools transition tuples.
>
> In either case, it seems like we can pick a pure hashing strategy or
> switch to using both hashing and sorting. For example, IIUC, Andres's
> proposal involves spooling mostly input tuples, but can also spool
> transition tuples if the transition values consume more memory as they
> absorb more tuples. However, you could do a similar kind of thing
> without needing sort support. When you see a value that's not doesn't
> fit in your in-memory hash table, use the hash code to route it to 1
> of N batch files. Have a second set of batch files for transition
> tuples in case you need to kick things out of the in-memory hash
> table. Once you finish reading the input, emit all the values that
> remain in the in-memory hash table and then process each batch file
> separately.
>
> Similarly, David's strategy involves spooling only transition tuples
> and then sorting on the group key, but it's again possible to do
> something similar without relying on sorting. Instead of flushing the
> in-memory hash table to a tuple store, split the transition tuples it
> contains among N batch files based on the hash code. Once you've read
> all of the input, go back and reprocess each batch file, combining
> transition values whenever the same group keys appear in more than one
> transition tuple.
>

Yeah, essentially something like the batching in hash joins.

> To me, the pure-hashing strategies look a little simpler, but maybe
> there's enough performance benefit from combining hashing and sorting
> that it's worth the complexity, or maybe we should just accept
> whichever variant somebody's willing to code. But I think we almost
> have to have separate handling for many-row-per-group and
> few-rows-per-group, because those seem fundamentally different.
>

I think the underlying question is whether we want to treat this as an
emergency safety against OOM (which should be a rare thing in most
cases) or something allowing us to pick hash aggregate more often.

It would be great to get something that performs better than just
falling back to sort (and I was advocating for that), but I'm worried we
might be moving the goalposts way too far.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-06-11 15:31:23 Re: file cloning in pg_upgrade and CREATE DATABASE
Previous Message Peter Eisentraut 2018-06-11 15:18:06 Re: config.{guess,sub} updates