Re: Default setting for enable_hashagg_disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-24 18:32:03
Message-ID: 20200624183203.cnqcol27ruastit7@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Wed, Jun 24, 2020 at 01:29:56PM -0400, Tom Lane wrote:
>Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
>> On Wed, Jun 24, 2020 at 05:06:28AM -0400, Bruce Momjian wrote:
>>> It would seem merge join has almost the same complexities as the new
>>> hash join code, since it can spill to disk doing sorts for merge joins,
>>> and adjusting work_mem is the only way to control that spill to disk. I
>>> don't remember anyone complaining about spills to disk during merge
>>> join, so I am unclear why we would need a such control for hash join.
>
>> It loooks like merge join was new in 8.3. I don't think that's a good analogy,
>> since the old behavior was still available with enable_mergejoin=off.
>
>Uh, what? A look into our git history shows immediately that
>nodeMergejoin.c has been there since the initial code import in 1996.
>
>I tend to agree with Bruce that it's not very obvious that we need
>another GUC knob here ... especially not one as ugly as this.
>I'm especially against the "neverspill" option, because that makes a
>single GUC that affects both the planner and executor independently.
>
>If we feel we need something to let people have the v12 behavior
>back, let's have
>(1) enable_hashagg on/off --- controls planner, same as it ever was
>(2) enable_hashagg_spill on/off --- controls executor by disabling spill
>

What if a user specifies

enable_hashagg = on
enable_hashagg_spill = off

and the estimates say the hashagg would need to spill to disk. Should
that disable the query (in which case the second GUC affects both
executor and planner) or run it (in which case we knowingly ignore
work_mem, which seems wrong).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2020-06-24 18:40:50 Re: Default setting for enable_hashagg_disk
Previous Message PG Doc comments form 2020-06-24 17:33:00 listing roles

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-24 18:40:50 Re: Default setting for enable_hashagg_disk
Previous Message Alvaro Herrera 2020-06-24 18:27:58 Re: Review for GetWALAvailability()