Re: Default setting for enable_hashagg_disk

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-10 13:43:51
Message-ID: 20200710134350.GH12375@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Greetings,

* Justin Pryzby (pryzby(at)telsasoft(dot)com) wrote:
> On Thu, Jul 09, 2020 at 06:58:40PM -0400, Stephen Frost wrote:
> > * Peter Geoghegan (pg(at)bowt(dot)ie) wrote:
> > > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > It makes more sense than simply ignoring what our users will see as a
> > > simple regression. (Though I still lean towards fixing the problem by
> > > introducing hash_mem, which at least tries to fix the problem head
> > > on.)
> >
> > The presumption that this will always end up resulting in a regression
> > really doesn't seem sensible to me.
>
> Nobody said "always" - we're concerned about a fraction of workloads which
> regress, badly affecting only only a small fraction of users.

And those workloads would be addressed by increasing work_mem, no? Why
are we inventing something new here for something that'll only impact a
small fraction of users in a small fraction of cases and where there's
already a perfectly workable way to address the issue?

> Maybe pretend that Jeff implemented something called CashAgg, which does
> everything HashAgg does but implemented from scratch. Users would be able to
> tune it or disable it, and we could talk about removing HashAgg for the next 3
> years. But instead we decide to remove HashAgg right now since it's redundant.
> That's a bad way to transition from an old behavior to a new one. It's scary
> because it imposes a burden, rather than offering a new option without also
> taking away the old one.

We already have enable_hashagg. Users are free to disable it. This
makes it also respect work_mem- allowing users to tune that value to
adjust how much memory HashAgg actually uses.

> > > That's not the only justification. The other justification is that
> > > it's generally reasonable to prefer giving hash aggregate more memory.
> >
> > Sure, and it's generally reasonably to prefer giving Sorts more memory
> > too... as long as you've got it available.
>
> I believe he meant:
> "it's generally reasonable to prefer giving hash aggregate more memory THAN OTHER NODES"

If we were developing a wholistic view of memory usage, with an overall
cap on how much memory is allowed to be used for a query, then that
would be an interesting thing to consider and discuss. That's not what
any of this is.

Thanks,

Stephen

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Stephen Frost 2020-07-10 14:17:14 Re: Default setting for enable_hashagg_disk
Previous Message Jeff Davis 2020-07-10 08:29:18 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-10 14:01:00 Re: Implement UNLOGGED clause for COPY FROM
Previous Message osumi.takamichi@fujitsu.com 2020-07-10 13:38:40 RE: Implement UNLOGGED clause for COPY FROM