Re: Default setting for enable_hashagg_disk

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-25 19:56:07
Message-ID: 20200625195607.GA16766@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Jun 25, 2020 at 03:24:42PM -0400, Bruce Momjian wrote:
> On Thu, Jun 25, 2020 at 11:10:58AM -0700, Jeff Davis wrote:
> > On Wed, 2020-06-24 at 13:29 -0400, Tom Lane wrote:
> > > If we feel we need something to let people have the v12 behavior
> > > back, let's have
> > > (1) enable_hashagg on/off --- controls planner, same as it ever was
> > > (2) enable_hashagg_spill on/off --- controls executor by disabling
> > > spill
> > >
> > > But I'm not really convinced that we need (2).
> >
> > If we're not going to have a planner GUC, one alternative is to just
> > penalize the disk costs of HashAgg for a release or two. It would only
> > affect the cost of HashAgg paths that are expected to spill, which
> > weren't even generated in previous releases.
> >
> > In other words, multiply the disk costs by enough that the planner will
> > usually not choose HashAgg if expected to spill unless the average
> > group size is quite large (i.e. there are a lot more tuples than
> > groups, but still enough groups to spill).
>
> Well, the big question is whether this costing is actually more accurate
> than what we have now. What I am hearing is that spilling hash agg is
> expensive, so whatever we can do to reflect the actual costs seems like
> a win. If it can be done, it certainly seems better than a cost setting
> few people will use.

It is my understanding that spill of sorts is mostly read sequentially,
while hash reads are random. Is that right? Is that not being costed
properly?

That doesn't fix the misestimation case, but increasing work mem does
allow pre-PG 13 behavior there.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Alvaro Herrera 2020-06-25 20:02:12 Re: Please add a link to [BRIN] physical block ranges of a table
Previous Message Steven Pousty 2020-06-25 19:32:28 Re: Please add a link to [BRIN] physical block ranges of a table

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-25 19:57:00 Re: should libpq also require TLSv1.2 by default?
Previous Message Tom Lane 2020-06-25 19:46:30 Re: Open Item: Should non-text EXPLAIN always show properties?