Quick Links

Re: Default setting for enable_hashagg_disk (hash_mem)

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Default setting for enable_hashagg_disk (hash_mem)
Date:	2020-07-04 09:19:46
Message-ID:	CAA4eK1KfPi6iz0hWxBLZzfVOG_NvOVJL=9UQQirWLpaN=kANTQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-docs pgsql-hackers

On Fri, Jul 3, 2020 at 7:38 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Thu, Jul 2, 2020 at 08:35:40PM -0700, Peter Geoghegan wrote:
> > But the problem isn't really the hashaggs-that-spill patch itself.
> > Rather, the problem is the way that work_mem is supposed to behave in
> > general, and the impact that that has on hash aggregate now that it
> > has finally been brought into line with every other kind of executor
> > node. There just isn't much reason to think that we should give the
> > same amount of memory to a groupagg + sort as a hash aggregate. The
> > patch more or less broke an existing behavior that is itself
> > officially broken. That is, the problem that we're trying to fix here
> > is only a problem to the extent that the previous scheme isn't really
> > operating as intended (because grouping estimates are inherently very
> > hard). A revert doesn't seem like it helps anyone.
> >
> > I accept that the idea of inventing hash_mem to fix this problem now
> > is unorthodox. In a certain sense it solves problems beyond the
> > problems that we're theoretically obligated to solve now. But any
> > "more conservative" approach that I can think of seems like a big
> > mess.
>
> We don't even have a user report yet of a
> regression compared to PG 12, or one that can't be fixed by increasing
> work_mem.
>

Yeah, this is exactly the same point I have raised above. I feel we
should wait before designing any solution to match pre-13 behavior for
hashaggs to see what percentage of users face problems related to this
and how much is a problem for them to increase work_mem to avoid
regression. Say, if only less than 1% of users face this problem and
some of them are happy by just increasing work_mem then we might not
need to do anything. OTOH, if 10% users face this problem and most of
them don't want to increase work_mem then it would be evident that we
need to do something about it and we can probably provide a guc at
that stage for them to revert to old behavior and do some advanced
solution in the master branch. I am not sure what is the right thing
to do here but it seems to me we are designing a solution based on the
assumption that we will have a lot of users who will be hit by this
problem and would be unhappy by the new behavior.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Default setting for enable_hashagg_disk (hash_mem) at 2020-07-03 14:08:08 from Bruce Momjian

Responses

Re: Default setting for enable_hashagg_disk (hash_mem) at 2020-07-04 20:53:58 from Jeff Davis

Browse pgsql-docs by date

	From	Date	Subject
Next Message	Michael Paquier	2020-07-04 12:57:04	Re: Function name "text_out" should be "textout"
Previous Message	Erwin Brandstetter	2020-07-04 07:59:22	Function name "text_out" should be "textout"

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	movead.li@highgo.ca	2020-07-04 10:01:28	Re: A patch for get origin from commit_ts.
Previous Message	Amit Kapila	2020-07-04 07:02:21	Re: Cleanup - Removed unused function parameter in reorder buffer & parallel vacuum