Re: Default setting for enable_hashagg_disk

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-09 22:58:40
Message-ID: 20200709225840.GE12375@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Greetings,

* Peter Geoghegan (pg(at)bowt(dot)ie) wrote:
> On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > The one for the planner is already there, and it looks like we need one
> > > for the executor as well (to tell HashAgg to ignore the memory limit
> > > just like v12).
> >
> > No, ignoring the limit set was, as agreed above, a bug, and I don't
> > think it makes sense to add some new user tunable for this.
>
> It makes more sense than simply ignoring what our users will see as a
> simple regression. (Though I still lean towards fixing the problem by
> introducing hash_mem, which at least tries to fix the problem head
> on.)

The presumption that this will always end up resulting in a regression
really doesn't seem sensible to me. We could rip out the logic in Sort
that spills to disk and see how much faster it gets- as long as we don't
actually run out of memory, but that's kind of the entire point of
having some kind of limit on the amount of memory we use, isn't it?

> > If folks
> > want to let HashAgg use more memory then they can set work_mem higher,
> > just the same as if they want a Sort node to use more memory or a
> > HashJoin. Yes, that comes with potential knock-on effects about other
> > nodes (possibly) using more memory but that's pretty well understood for
> > all the other cases and I don't think that it makes sense to have a
> > special case for HashAgg when the only justification is that "well, you
> > see, it used to have this bug, so...".
>
> That's not the only justification. The other justification is that
> it's generally reasonable to prefer giving hash aggregate more memory.

Sure, and it's generally reasonably to prefer giving Sorts more memory
too... as long as you've got it available.

> This would even be true in a world where all grouping estimates were
> somehow magically accurate. These two justifications coincide in a way
> that may seem a bit too convenient to truly be an accident of history.
> And if they do: I agree. It's no accident.

I disagree that the lack of HashAgg's ability to spill to disk was
because, for this one particular node, we should always just give it
however much memory it needs, regardless of if it's anywhere near how
much we thought it'd need or not.

> It seems likely that we have been "complicit" in enabling
> "applications that live beyond their means", work_mem-wise. We knew
> that hash aggregate had this "bug" forever, and yet we were reasonably
> happy to have it be the common case for years. It's very fast, and
> didn't actually explode most of the time (even though grouping
> estimates are often pretty poor). Hash agg was and is the common case.

I disagree that we were reasonably happy with this bug or that it
somehow makes sense to retain it. HashAgg is certainly commonly used,
but that's not really relevant- it's still going to be used quite a bit,
it's just that, now, when our estimates are far wrong, we won't just
gobble up all the memory available and instead will spill to disk- just
like we do with the other nodes.

> Yes, we were concerned about the risk of OOM for many years, but it
> was considered a risk worth taking. We knew what the trade-off was. We
> never quite admitted it, but what does it matter?

This is not some well designed feature of HashAgg that had a lot of
thought put into it, whereby the community agreed that we should just
let it be and hope no one noticed or got bit by it- I certainly have
managed to kill servers by a HashAgg gone bad and I seriously doubt I'm
alone in that.

> Our own tacit attitude towards hash agg + work_mem mirrors that of our
> users (or at least the users that will be affected by this issue, of
> which there will be plenty). Declaring this behavior a bug with no
> redeeming qualities now seems a bit rich.

No, I disagree entirely.

Thanks,

Stephen

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Peter Geoghegan 2020-07-09 23:42:25 Re: Default setting for enable_hashagg_disk
Previous Message Peter Geoghegan 2020-07-09 22:32:20 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-07-09 23:24:19 Re: output columns of \dAo and \dAp
Previous Message Alvaro Herrera 2020-07-09 22:55:48 Re: Intermittent BRIN failures on hyrax and lousyjack