Re: Default setting for enable_hashagg_disk

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-27 10:00:25
Message-ID: CAA4eK1K0cgk_8hRyxsvppgoh_Z-NY+UZTcFWB2we6baJ9DXCQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Jun 25, 2020 at 12:59 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> So, I don't think we can wire in a constant like 10x. That's really
> unprincipled and I think it's a bad idea. What we could do, though, is
> replace the existing Boolean-valued GUC with a new GUC that controls
> the size at which the aggregate spills. The default could be -1,
> meaning work_mem, but a user could configure a larger value if desired
> (presumably, we would just treat a value smaller than work_mem as
> work_mem, and document the same).
>
> I think that's actually pretty appealing. Separating the memory we
> plan to use from the memory we're willing to use before spilling seems
> like a good idea in general, and I think we should probably also do it
> in other places - like sorts.
>

+1. I also think GUC on these lines could help not only the problem
being discussed here but in other cases as well. However, I think the
real question is do we want to design/implement it for PG13? It seems
to me at this stage we don't have a clear understanding of what
percentage of real-world cases will get impacted due to the new
behavior of hash aggregates. We want to provide some mechanism as a
safety net to avoid problems that users might face which is not a bad
idea but what if we wait and see the real impact of this? Is it too
bad to provide a GUC later in back-branch if we see users face such
problems quite often? I think the advantage of delaying it is that we
might see some real problems (like where hash aggregate is not a good
choice) which can be fixed via the costing model.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tomas Vondra 2020-06-27 10:41:41 Re: Default setting for enable_hashagg_disk
Previous Message Peter Geoghegan 2020-06-27 00:24:36 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-06-27 10:41:41 Re: Default setting for enable_hashagg_disk
Previous Message Peter Eisentraut 2020-06-27 09:36:10 Re: pgsql: Enable Unix-domain sockets support on Windows