Re: Spilling hashed SetOps and aggregates to disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-07 08:20:35
Message-ID: 34475b4f-54aa-add3-3845-20e8132842f9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/07/2018 02:18 AM, Andres Freund wrote:
> On 2018-06-06 17:17:52 -0700, Andres Freund wrote:
>> On 2018-06-07 12:11:37 +1200, David Rowley wrote:
>>> On 7 June 2018 at 08:11, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>> On 06/06/2018 04:11 PM, Andres Freund wrote:
>>>>> Consider e.g. a scheme where we'd switch from hashed aggregation to
>>>>> sorted aggregation due to memory limits, but already have a number of
>>>>> transition values in the hash table. Whenever the size of the transition
>>>>> values in the hashtable exceeds memory size, we write one of them to the
>>>>> tuplesort (with serialized transition value). From then on further input
>>>>> rows for that group would only be written to the tuplesort, as the group
>>>>> isn't present in the hashtable anymore.
>>>>>
>>>>
>>>> Ah, so you're suggesting that during the second pass we'd deserialize
>>>> the transition value and then add the tuples to it, instead of building
>>>> a new transition value. Got it.
>>>
>>> Having to deserialize every time we add a new tuple sounds terrible
>>> from a performance point of view.
>>
>> I didn't mean that we do that, and I don't think David understood it as
>> that either. I was talking about the approach where the second pass is a
>> sort rather than hash based aggregation. Then we would *not* need to
>> deserialize more than exactly once.
>
> s/David/Tomas/, obviously. Sorry, it's been a long day.
>

Solution is simple: drink more coffee. ;-)

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2018-06-07 08:23:52 Re: Concurrency bug in UPDATE of partition-key
Previous Message Simon Riggs 2018-06-07 07:32:10 Re: Possible bug in logical replication.