Re: Default setting for enable_hashagg_disk

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-11 01:35:43
Message-ID: CAKFQuwYjJMvXST2nbVQZwdAiqCZMFRFP=Bxx0kbkBbvsdh4EXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Jul 10, 2020 at 6:19 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> If we have to have a new GUC, my preference would be hashagg_mem,
> where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
> would mean use that value. We'd need some sort of check hook to
> disallow 0-63. I really am just failing to comprehend why we're
> contemplating changing the behaviour of Hash Join here.

If we add a setting that defaults to work_mem then the benefit is severely
reduced. You still have to modify individual queries, but the change can
simply be more targeted than changing work_mem alone. I truly desire to
have whatever we do provide that ability as well as a default value that is
greater than the current work_mem value - which in v12 was being ignored
and thus production usages saw memory consumption greater than work_mem.
Only a multiplier does this. A multiplier-only solution fixes the problem
at hand. A multiplier-or-memory solution adds complexity but provides
flexibility. If adding that flexibility is straight-forward I don't see
any serious downside other than the complexity of having the meaning of a
single GUC's value dependent upon its magnitude.

Of course, I
> understand that that node type also uses a hash table, but why does
> that give it the right to be involved in a change that we're making to
> try and give users the ability to avoid possible regressions with Hash
> Agg?
>

If Hash Join isn't affected by the "was allowed to use unlimited amounts of
execution memory but now isn't" change then it probably should continue to
consult work_mem instead of being changed to use the calculated value
(work_mem x multiplier).

David J.

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message David Rowley 2020-07-11 01:43:28 Re: Default setting for enable_hashagg_disk
Previous Message David Rowley 2020-07-11 01:19:39 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumyadeep Chakraborty 2020-07-11 01:36:31 Re: Does TupleQueueReaderNext() really need to copy its result?
Previous Message Michael Paquier 2020-07-11 01:35:07 Re: pg_regress cleans up tablespace twice.