Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-13 16:20:32
Message-ID: CAH2-WzkRXh+3bQ8cBB5RNH-eVrjtEP+0JTymGeRW1pvffx5VCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Yes, increasing work_mem isn't unusual, at all.

It's unusual as a way of avoiding OOMs!

> Eh? That's not at all what it looks like- they were getting OOM's
> because they set work_mem to be higher than the actual amount of memory
> they had and the Sort before the GroupAgg was actually trying to use all
> that memory. The HashAgg ended up not needing that much memory because
> the aggregated set wasn't actually that large. If anything, this shows
> exactly what Jeff's fine work here is (hopefully) going to give us- the
> option to plan a HashAgg in such cases, since we can accept spilling to
> disk if we end up underestimate, or take advantage of that HashAgg
> being entirely in memory if we overestimate.

I very specifically said that it wasn't a case where something like
hash_mem would be expected to make all the difference.

> Having looked back, I'm not sure that I'm really in the minority
> regarding the proposal to add this at this time either- there's been a
> few different comments that it's too late for v13 and/or that we should
> see if we actually end up with users seriously complaining about the
> lack of a separate way to specify the memory for a given node type,
> and/or that if we're going to do this then we should have a broader set
> of options covering other nodes types too, all of which are positions
> that I agree with.

By proposing to do nothing at all, you are very clearly in a small
minority. While (for example) I might have debated the details with
David Rowley a lot recently, and you couldn't exactly say that we're
in agreement, our two positions are nevertheless relatively close
together.

AFAICT, the only other person that has argued that we should do
nothing (have no new GUC) is Bruce, which was a while ago now. (Amit
said something similar, but has since softened his opinion [1]).

[1] https://postgr.es.m/m/CAA4eK1+KMSQuOq5Gsj-g-pYec_8zgGb4K=xRznbCccnaumFqSA@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Alvaro Herrera 2020-07-13 16:47:36 Re: Default setting for enable_hashagg_disk
Previous Message David G. Johnston 2020-07-13 15:45:25 Re: docs: psql and variable interpolation

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesse Zhang 2020-07-13 16:31:58 Fix header identification
Previous Message Pavel Stehule 2020-07-13 15:33:38 Re: proposal: possibility to read dumped table's name from file