Quick Links

Re: Default setting for enable_hashagg_disk

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Default setting for enable_hashagg_disk
Date:	2020-07-13 16:20:32
Message-ID:	CAH2-WzkRXh+3bQ8cBB5RNH-eVrjtEP+0JTymGeRW1pvffx5VCw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-docs pgsql-hackers

On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Yes, increasing work_mem isn't unusual, at all.

It's unusual as a way of avoiding OOMs!

> Eh? That's not at all what it looks like- they were getting OOM's
> because they set work_mem to be higher than the actual amount of memory
> they had and the Sort before the GroupAgg was actually trying to use all
> that memory. The HashAgg ended up not needing that much memory because
> the aggregated set wasn't actually that large. If anything, this shows
> exactly what Jeff's fine work here is (hopefully) going to give us- the
> option to plan a HashAgg in such cases, since we can accept spilling to
> disk if we end up underestimate, or take advantage of that HashAgg
> being entirely in memory if we overestimate.

I very specifically said that it wasn't a case where something like
hash_mem would be expected to make all the difference.

> Having looked back, I'm not sure that I'm really in the minority
> regarding the proposal to add this at this time either- there's been a
> few different comments that it's too late for v13 and/or that we should
> see if we actually end up with users seriously complaining about the
> lack of a separate way to specify the memory for a given node type,
> and/or that if we're going to do this then we should have a broader set
> of options covering other nodes types too, all of which are positions
> that I agree with.

By proposing to do nothing at all, you are very clearly in a small
minority. While (for example) I might have debated the details with
David Rowley a lot recently, and you couldn't exactly say that we're
in agreement, our two positions are nevertheless relatively close
together.

AFAICT, the only other person that has argued that we should do
nothing (have no new GUC) is Bruce, which was a while ago now. (Amit
said something similar, but has since softened his opinion [1]).

[1] https://postgr.es.m/m/CAA4eK1+KMSQuOq5Gsj-g-pYec_8zgGb4K=xRznbCccnaumFqSA@mail.gmail.com
--
Peter Geoghegan

In response to

Re: Default setting for enable_hashagg_disk at 2020-07-13 13:13:42 from Stephen Frost

Responses

Re: Default setting for enable_hashagg_disk at 2020-07-14 02:38:14 from Amit Kapila

Browse pgsql-docs by date

	From	Date	Subject
Next Message	Alvaro Herrera	2020-07-13 16:47:36	Re: Default setting for enable_hashagg_disk
Previous Message	David G. Johnston	2020-07-13 15:45:25	Re: docs: psql and variable interpolation

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jesse Zhang	2020-07-13 16:31:58	Fix header identification
Previous Message	Pavel Stehule	2020-07-13 15:33:38	Re: proposal: possibility to read dumped table's name from file