Re: Default setting for enable_hashagg_disk

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-10 00:18:52
Message-ID: 20200710001852.GC26220@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Jul 09, 2020 at 06:58:40PM -0400, Stephen Frost wrote:
> * Peter Geoghegan (pg(at)bowt(dot)ie) wrote:
> > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > It makes more sense than simply ignoring what our users will see as a
> > simple regression. (Though I still lean towards fixing the problem by
> > introducing hash_mem, which at least tries to fix the problem head
> > on.)
>
> The presumption that this will always end up resulting in a regression
> really doesn't seem sensible to me.

Nobody said "always" - we're concerned about a fraction of workloads which
regress, badly affecting only only a small fraction of users.

Maybe pretend that Jeff implemented something called CashAgg, which does
everything HashAgg does but implemented from scratch. Users would be able to
tune it or disable it, and we could talk about removing HashAgg for the next 3
years. But instead we decide to remove HashAgg right now since it's redundant.
That's a bad way to transition from an old behavior to a new one. It's scary
because it imposes a burden, rather than offering a new option without also
taking away the old one.

> > That's not the only justification. The other justification is that
> > it's generally reasonable to prefer giving hash aggregate more memory.
>
> Sure, and it's generally reasonably to prefer giving Sorts more memory
> too... as long as you've got it available.

I believe he meant:
"it's generally reasonable to prefer giving hash aggregate more memory THAN OTHER NODES"

Referencing:
https://www.postgresql.org/message-id/CAH2-Wz=YEMOeXdAPwZo7uriR5KPsf_RGuMHvk3HvLDVksdrwHg@mail.gmail.com
https://www.postgresql.org/message-id/CAH2-Wznd_wL+Q3sUjLN3o5F6Q5AvHSTYOozPAei2QfuYDSd4fw@mail.gmail.com
https://www.postgresql.org/message-id/CAH2-Wz=osB4oi_nH8MnosYhVVSNOm5q3=exGe-b3q6gWOgf98w@mail.gmail.com

Peter's patch and David's competing proposal aim to provide a kind of "soft"
transition, where the old behavior is still possible (whether default or not).

David's proposal (enable_hashagg=neverspill/soft) [0] isn't implemented, but
would allow returning *exactly* to the pre-13 behavior (modulo other planner
changes). That's a bit of a kludge and we'd hope to eventually remove the
neverspill/soft stuff, returning enable_* to boolean.

Whereas Peter's hash_mem patch allows returning to the pre-13 behavior if the
mem required for hashagg is between work_mem and hash_mem; but, by definition,
not if the required mem exceeds hash_mem, in which case it would spill to disk.
This has the advantage that it's independently useful and not a transitional
kludge.

--
Justin

[0] Bruce proposed some "naming things" amendments to David's idea:
https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=UKZKq1_FzHy+XzmUzAJiNj6RWyTHH4UfA@mail.gmail.com
https://www.postgresql.org/message-id/20200625171756.GB12486@momjian.us

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Peter Geoghegan 2020-07-10 01:21:07 Re: Default setting for enable_hashagg_disk
Previous Message Stephen Frost 2020-07-10 00:08:42 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-07-10 00:22:04 Re: Intermittent BRIN failures on hyrax and lousyjack
Previous Message Thomas Munro 2020-07-10 00:15:44 Re: Collation versioning