Re: Default setting for enable_hashagg_disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-12 00:30:37
Message-ID: 20200712003037.wzrcltujwzwiwlfr@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sat, Jul 11, 2020 at 09:02:43AM -0700, David G. Johnston wrote:
>On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
>> There now seems to be some suggestions that not only should we have a
>> new GUC, but we should default to having it not be equal to work_mem (or
>> 1.0 or whatever) and instead by higher, to be *twice* or larger whatever
>> the existing work_mem setting is- meaning that people whose systems are
>> working just fine and have good estimates that represent their workload
>> and who get the plans they want may then start seeing differences and
>> increased memory utilization in places that they *don't* want that, all
>> because we're scared that someone, somewhere, might see a regression due
>> to HashAgg spilling to disk.
>>
>
>If that increased memory footprint allows the planner to give me a better
>plan with faster execution and with no OOM I'd be very happy that this
>change happened. While having a more flexible memory allocation framework
>is not a primary goal in and of itself it is a nice side-effect. I'm not
>going to say "let's only set work_mem to 32MB instead of 48MB so I can
>avoid this faster HashAgg node and instead execute a nested loop (or
>whatever)". More probable is the user whose current nested loop plan is
>fast enough and doesn't even realize that with a bit more memory they could
>get an HashAgg that performs 15% faster. For them this is a win on its
>face.
>
>I don't believe this negatively impacts the super-admin in our user-base
>and is a decent win for the average and below average admin.
>
>Do we really have an issue with plans being chosen while having access to
>more memory being slower than plans chosen while having less memory?
>
>The main risk here is that we choose for a user to consume more memory than
>they expected and they report OOM issues to us. We tell them to set this
>new GUC to 1.0. But that implies they are getting many non-HashAgg plans
>produced when with a bit more memory those HashAgg plans would have been
>chosen. If they get those faster plans without OOM it's a win, if it OOMs
>it's a loss. I'm feeling optimistic here and we'll get considerably more
>wins than losses. How loss-averse do we need to be here though? Npte we
>can give the upgrading user advance notice of our loss-aversion level and
>they can simply disagree and set it to 1.0 and/or perform more thorough
>testing. So being optimistic feels like the right choice.
>

I don't know, but one of the main arguments against simply suggesting
people to bump up work_mem (if they're hit by the hashagg spill in v13)
was that it'd increase overall memory usage for them. It seems strange
to then propose a new GUC set to a default that would result in higher
memory usage *for everyone*.

Of course, having such GUC with a default a multiple of work_mem might
be a win overall - or maybe not. I don't have a very good idea how many
people will get bitten by this, and how many will get speedup (and how
significant the speedup would be).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2020-07-12 00:47:54 Re: Default setting for enable_hashagg_disk
Previous Message Peter Geoghegan 2020-07-12 00:08:31 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-12 00:43:31 Re: StartupMessage parameters - free-form or not?
Previous Message Jaka Jančar 2020-07-12 00:14:27 StartupMessage parameters - free-form or not?