Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-12 00:08:31
Message-ID: CAH2-Wz=19NR-sNxAbstxtrOgmLWJRCpCfB+1zM247rf+iDvPcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sat, Jul 11, 2020 at 4:23 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I find that example rather suspicious. I mean, what exactly in the
> GroupAgg plan would consume this memory? Surely it'd have to be some
> node below the grouping, but sort shouldn't do that, no?
>
> Seems strange.

Well, I imagine hash aggregate manages to use much less memory than
the equivalent groupagg's sort, even though to the optimizer it
appears as if hash agg should end up using more memory (which is not
allowed by the optimizer when it exceeds work_mem, regardless of
whether or not it's faster). It may also be relevant that Hash agg can
use less memory simply by being faster. Going faster could easily
reduce the memory usage for the system as a whole, even when you
assume individual group agg nodes use more memory for as long as they
run. So in-memory hash agg is effectively less memory hungry.

It's not a great example of a specific case that we'd regress by not
having hash_mem/hash_mem_multiplier. It's an overestimate where older
releases accidentally got a bad, slow plan, not an underestimate where
older releases "lived beyond their means but got away with it" by
getting a good, fast plan. ISTM that the example is a good example of
the strange dynamics involved.

> I agree grouping estimates are often quite off, and I kinda agree with
> introducing hash_mem (or at least with the concept that hashing is more
> sensitive to amount of memory than sort). Not sure it's the right espace
> hatch to the hashagg spill problem, but maybe it is.

The hash_mem/hash_mem_multiplier proposal aims to fix the problem
directly, and not be an escape hatch, because we don't like escape
hatches. I think that that probably fixes many or most of the problems
in practice, at least assuming that the admin is willing to tune it.
But a small number of remaining installations may still need a "true"
escape hatch. There is an argument for having both, though I hope that
the escape hatch can be avoided.

--
Peter Geoghegan

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Tomas Vondra 2020-07-12 00:30:37 Re: Default setting for enable_hashagg_disk
Previous Message Peter Geoghegan 2020-07-11 23:46:04 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-07-12 00:11:01 Re: POC: postgres_fdw insert batching
Previous Message Sascha Kuhl 2020-07-12 00:06:56 Re: WIP: BRIN multi-range indexes