Re: Default setting for enable_hashagg_disk

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-24 09:06:28
Message-ID: 20200624090628.GA19640@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Wed, Jun 24, 2020 at 02:11:57PM +1200, David Rowley wrote:
> On Tue, 23 Jun 2020 at 08:24, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > Another way of looking at it is that the weird behavior is already
> > there in v12, so there are already users relying on this weird behavior
> > as a crutch for some other planner mistake. The question is whether we
> > want to:
> >
> > (a) take the weird behavior away now as a consequence of implementing
> > disk-based HashAgg; or
> > (b) support the weird behavior forever; or
> > (c) introduce a GUC now to help transition away from the weird behavior
> >
> > The danger with (c) is that it gives users more time to become more
> > reliant on the weird behavior; and worse, a GUC could be seen as an
> > endorsement of the weird behavior rather than a path to eliminating it.
> > So we could intend to do (c) and end up with (b). We can mitigate this
> > with documentation warnings, perhaps.
>
> So, I have a few thoughts on this subject. I understand both problem
> cases have been mentioned before on this thread, but just to reiterate
> the two problem cases that we really would rather people didn't hit.

I appreciated this summary since I wasn't fully following the issues.

> As for GUCs to try to help the group of users who, *I'm certain*, will
> have problems with PG13's plan choice. I think the overloaded
> enable_hashagg option is a really nice compromise. We don't really
> have any other executor node type that has multiple GUCs controlling
> its behaviour, so I believe it would be nice to keep it that way.

So, in trying to anticipate how users will be affected by an API change,
I try to look at similar cases where we already have this behavior, and
how users react to this. Looking at the available join methods, I think
we have one. We currently support:

* nested loop with sequential scan
* nested loop with index scan
* hash join
* merge join

It would seem merge join has almost the same complexities as the new
hash join code, since it can spill to disk doing sorts for merge joins,
and adjusting work_mem is the only way to control that spill to disk. I
don't remember anyone complaining about spills to disk during merge
join, so I am unclear why we would need a such control for hash join.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message David Rowley 2020-06-24 12:24:29 Re: Default setting for enable_hashagg_disk
Previous Message Justin Pryzby 2020-06-24 03:14:43 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-06-24 09:07:36 Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation
Previous Message ROS Didier 2020-06-24 09:05:30 PostgreSQL and big data - FDW