Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-29 21:46:38
Message-ID: CAH2-Wzmt+b6=TQmavHKbsiD9g4R+iFaMEK89P88Bq5w=vgdnTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Mon, Jun 29, 2020 at 2:22 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Can you give and example of what you mean by being too permissive or too
> conservative? Do you mean the possibility of unlimited memory usage in
> v12, and strict enforcement in v13?

Yes -- that's all I meant.

> IMO enforcing the work_mem limit (in v13) is right in principle, but I
> do understand the concerns about unexpected regressions compared to v12.

Yeah. Both of these two things are true at the same time.

> I don't understand what you mean by "less memory" when the whole issue
> is significantly exceeding work_mem?

I was just reiterating what I said a few times already: Not using an
in-memory hash aggregate when the amount of memory required is high
but not prohibitively high is penny wise, pound foolish. It's easy to
imagine this actually using up more memory when an entire workload is
considered. This observation does not apply to a system that only ever
has one active connection at a time, but who cares about that?

> I don't think the OOM is the only negative performance here - using a
> lot of memory also forces eviction of data from page cache (although
> writing a lot of temporary files may have similar effect).

True.

> I agree with this, and I'm mostly OK with having hash_mem. In fact, from
> the proposals in this thread I like it the most - as long as it's used
> both during planning and execution. It's a pretty clear solution.

Great.

It's not trivial to write the patch, since there are a few tricky
cases. For example, maybe there is some subtlety in nodeAgg.c with
AGG_MIXED cases. Is there somebody else that knows that code better
than I do that wants to have a go at writing a patch?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Fujii Masao 2020-06-30 05:56:50 Re: The description for pg_replication_slots.restart_lsn
Previous Message Tomas Vondra 2020-06-29 21:33:19 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-06-29 22:23:49 Re: track_planning causing performance regression
Previous Message Tomas Vondra 2020-06-29 21:33:19 Re: Default setting for enable_hashagg_disk