Re: Spilling hashed SetOps and aggregates to disk

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-05 17:52:09
Message-ID: 20180605175209.vavuqe4idovcpeie@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-06-05 10:47:49 -0700, Jeff Davis wrote:
> The thing I don't like about it is that it requires running two memory-
> hungry operations at once. How much of work_mem do we use for sorted
> runs, and how much do we use for the hash table?

Is that necessarily true? I'd assume that we'd use a small amount of
memory for the tuplesort, enough to avoid unnecessary disk spills for
each tuple. But a few kb should be enough - think it's fine to
aggressively spill to disk, we after all already have handled the case
of smaller number of input rows. Then at the end of the run, we empty
out the hashtable, and free it. Only then we do to the sort.

One thing this wouldn't handle are datatypes that support hashing, but
no sorting. Not exactly common.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2018-06-05 17:53:07 Re: Code of Conduct plan
Previous Message David Fetter 2018-06-05 17:51:14 Re: [PATCH] Trim trailing whitespace in vim and emacs