Re: Default setting for enable_hashagg_disk

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-24 12:38:43
Message-ID: 20200624123843.GX4107@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Wed, Jun 24, 2020 at 05:06:28AM -0400, Bruce Momjian wrote:
> On Wed, Jun 24, 2020 at 02:11:57PM +1200, David Rowley wrote:
> > On Tue, 23 Jun 2020 at 08:24, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > > Another way of looking at it is that the weird behavior is already
> > > there in v12, so there are already users relying on this weird behavior
> > > as a crutch for some other planner mistake. The question is whether we
> > > want to:
> > >
> > > (a) take the weird behavior away now as a consequence of implementing
> > > disk-based HashAgg; or
> > > (b) support the weird behavior forever; or
> > > (c) introduce a GUC now to help transition away from the weird behavior
> > >
> > > The danger with (c) is that it gives users more time to become more
> > > reliant on the weird behavior; and worse, a GUC could be seen as an
> > > endorsement of the weird behavior rather than a path to eliminating it.
> > > So we could intend to do (c) and end up with (b). We can mitigate this
> > > with documentation warnings, perhaps.
> >
> > So, I have a few thoughts on this subject. I understand both problem
> > cases have been mentioned before on this thread, but just to reiterate
> > the two problem cases that we really would rather people didn't hit.
>
> I appreciated this summary since I wasn't fully following the issues.
>
> > As for GUCs to try to help the group of users who, *I'm certain*, will
> > have problems with PG13's plan choice. I think the overloaded
> > enable_hashagg option is a really nice compromise. We don't really
> > have any other executor node type that has multiple GUCs controlling
> > its behaviour, so I believe it would be nice to keep it that way.
...
> It would seem merge join has almost the same complexities as the new
> hash join code, since it can spill to disk doing sorts for merge joins,
> and adjusting work_mem is the only way to control that spill to disk. I
> don't remember anyone complaining about spills to disk during merge
> join, so I am unclear why we would need a such control for hash join.

It loooks like merge join was new in 8.3. I don't think that's a good analogy,
since the old behavior was still available with enable_mergejoin=off.

I think a better analogy would be if we now changed sort nodes beneath merge
join to use at most 0.5*work_mem, with no way of going back to using
1.0*work_mem.

--
Justin

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2020-06-24 17:08:34 Re: Default setting for enable_hashagg_disk
Previous Message David Rowley 2020-06-24 12:24:29 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message ROS Didier 2020-06-24 12:39:38 RE: PostgreSQL and big data - FDW
Previous Message David Rowley 2020-06-24 12:24:29 Re: Default setting for enable_hashagg_disk