Re: Default setting for enable_hashagg_disk

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-10 14:17:14
Message-ID: 20200710141714.GI12375@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Greetings,

* Peter Geoghegan (pg(at)bowt(dot)ie) wrote:
> On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I didn't, and don't, think it particularly relevant to the discussion,
> > but if you don't like the comparison to Sort then we could compare it to
> > a HashJoin instead- the point is that, yes, if you are willing to give
> > more memory to a given operation, it's going to go faster, and the way
> > users tell us that they'd like the query to use more memory is already
> > well-defined and understood to be through work_mem. We do not do them a
> > service by ignoring that.
>
> The hash_mem design (as it stands) would affect both hash join and
> hash aggregate. I believe that it makes most sense to have hash-based
> nodes under the control of a single GUC. I believe that this
> granularity will cause the least problems. It certainly represents a
> trade-off.

So, now this has moved from being a hack to deal with a possible
regression for a small number of users due to new behavior in one node,
to a change that has impacts on other nodes that hadn't been changed,
all happening during beta.

No, I don't agree with this. Now is not the time for designing new
features for v13.

> work_mem is less of a problem with hash join, primarily because join
> estimates are usually a lot better than grouping estimates. But it is
> nevertheless something that it makes sense to put in the same
> conceptual bucket as hash aggregate, pending a future root and branch
> redesign of work_mem.

I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
the wrong direction because what's actually needed is a way to properly
consider and track overall memory usage- a redesign of work_mem (or some
new parameter, but it wouldn't be 'hash_mem') as you say, but all of
this discussion should be targeting v14.

> Like I said, the escape hatch GUC is not my preferred solution. But at
> least it acknowledges the problem. I don't think that anyone (or
> anyone else) believes that work_mem doesn't have serious limitations.

work_mem obviously has serious limitations, but that's not novel or new
or unexpected by anyone.

> > We have a parameter which already drives this and which users are
> > welcome to (and quite often do) tune. I disagree that anything further
> > is either essential or particularly desirable.
>
> This is a user hostile attitude.

I don't find that argument convincing, at all.

> > I'm really rather astounded at the direction this has been going in.
>
> Why?

Due to the fact that we're in beta and now is not the time to be
redesigning this feature. What Jeff implemented was done in a way that
follows the existing structure for how all of the other nodes work and
how HashAgg was *intended* to work (as in- if we thought the HashAgg
would go over work_mem, we wouldn't pick it and would do a GroupAgg
instead). If there's bugs in his implementation (which I doubt, but it
can happen, of course) then that'd be useful to discuss and look at
fixing, but this discussion isn't appropriate for beta.

Thanks,

Stephen

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Stephen Frost 2020-07-10 14:34:15 Re: Default setting for enable_hashagg_disk
Previous Message Stephen Frost 2020-07-10 13:43:51 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-07-10 14:34:15 Re: Default setting for enable_hashagg_disk
Previous Message Tom Lane 2020-07-10 14:10:46 Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions