Re: Spilling hashed SetOps and aggregates to disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-11 17:33:39
Message-ID: 46734151-3245-54ac-76fc-17742fb0e6d9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/11/2018 07:14 PM, Andres Freund wrote:
> Hi,
>
> On 2018-06-11 17:29:52 +0200, Tomas Vondra wrote:
>> It would be great to get something that performs better than just falling
>> back to sort (and I was advocating for that), but I'm worried we might be
>> moving the goalposts way too far.
>
> I'm unclear on why that'd have that bad performance in relevant
> cases. You're not going to hit the path unless the number of groups is
> pretty large (or work_mem is ridiculously small, in which case we don't
> care). With a large number of groups the sorting path isn't particularly
> inefficient, because repeatedly storing the input values isn't such a
> large fraction in comparison to the number of groups (and their
> transition values). Which scenarios are you concerned about?
>

Say you have a 1TB table, and keeping the groups in memory would require
work_mem=2GB. After hitting the work_mem limit, there may be pretty
large amount of tuples you'd have to spill to disk and sort.

For example we hit the work_mem limit after processing 10% tuples,
switching to sort would mean spill+sort of 900GB of data. Or we might
say - hmm, we're 10% through, so we expect hitting the limit 10x, so
let's spill the hash table and then do sort on that, writing and sorting
only 10GB of data. (Or merging it in some hash-based way, per Robert's
earlier message.)

I don't quite understand the argument that the number of groups needs to
be pretty large for us to hit this. So what if the groups are 2x or 10x
more than work_mem? It can still be cheaper than switching to sort-based
approach, no?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-06-11 17:33:42 Re: JIT versus standalone-headers checks
Previous Message Andrew Dunstan 2018-06-11 17:31:12 Re: [PATCH v16] GSSAPI encryption support