Re: Spilling hashed SetOps and aggregates to disk

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-07 00:17:52
Message-ID: 20180607001752.ngmdizthb2e7ksaj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-06-07 12:11:37 +1200, David Rowley wrote:
> On 7 June 2018 at 08:11, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > On 06/06/2018 04:11 PM, Andres Freund wrote:
> >> Consider e.g. a scheme where we'd switch from hashed aggregation to
> >> sorted aggregation due to memory limits, but already have a number of
> >> transition values in the hash table. Whenever the size of the transition
> >> values in the hashtable exceeds memory size, we write one of them to the
> >> tuplesort (with serialized transition value). From then on further input
> >> rows for that group would only be written to the tuplesort, as the group
> >> isn't present in the hashtable anymore.
> >>
> >
> > Ah, so you're suggesting that during the second pass we'd deserialize
> > the transition value and then add the tuples to it, instead of building
> > a new transition value. Got it.
>
> Having to deserialize every time we add a new tuple sounds terrible
> from a performance point of view.

I didn't mean that we do that, and I don't think David understood it as
that either. I was talking about the approach where the second pass is a
sort rather than hash based aggregation. Then we would *not* need to
deserialize more than exactly once.

> Can't we just:
>
> 1. HashAgg until the hash table reaches work_mem.
> 2. Spill the entire table to disk.
> 3. Destroy the table and create a new one.
> 4. If more tuples: goto 1
> 5. Merge sort and combine each dumped set of tuples.

Well, that requires sorting without really much advantage. I was more
thinking of

1. HashAgg until the hash table reaches work_mem.
2. If entry is in HashTable advance() there, goto 3a. Otherwise put into
tuplestore, goto 2.
3. If this increases the size of transition values too much, spill
tuple into tuplestore.
4. Project tuples from hashtable, destroy.
5. Sort tuplestore with a comparator that sorts extra column with
serialized states first. Project.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-06-07 00:18:54 Re: Spilling hashed SetOps and aggregates to disk
Previous Message Tomas Vondra 2018-06-07 00:16:11 Re: Spilling hashed SetOps and aggregates to disk