Quick Links

Re: Default setting for enable_hashagg_disk

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Default setting for enable_hashagg_disk
Date:	2020-04-09 20:02:07
Message-ID:	6292f5a766198d5745eb8cc17ac736366056671d.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-docs pgsql-hackers

On Thu, 2020-04-09 at 15:26 -0400, Robert Haas wrote:
> I think it's actually pretty different. All of the other enable_*
> GUCs
> disable an entire type of plan node, except for cases where that
> would
> otherwise result in planning failure. This just disables a portion of
> the planning logic for a certain kind of node, without actually
> disabling the whole node type. I'm not sure that's a bad idea, but it
> definitely seems to be inconsistent with what we've done in the past.

The patch adds two GUCs. Both are slightly weird, to be honest, but let
me explain the reasoning. I am open to other suggestions.

1. enable_hashagg_disk (default true):

This is essentially there just to get some of the old behavior back, to
give people an escape hatch if they see bad plans while we are tweaking
the costing. The old behavior was weird, so this GUC is also weird.

Perhaps we can make this a compatibility GUC that we eventually drop? I
don't necessarily think this GUC would make sense, say, 5 versions from
now. I'm just trying to be conservative because I know that, even if
the plans are faster for 90% of people, the other 10% will be unhappy
and want a way to work around it.

2. enable_groupingsets_hash_disk (default false):

This is about how we choose which grouping sets to hash and which to
sort when generating mixed mode paths.

Even before this patch, there are quite a few paths that could be
generated. It tries to estimate the size of each grouping set's hash
table, and then see how many it can fit in work_mem (knapsack), while
also taking advantage of any path keys, etc.

With Disk-based Hash Aggregation, in principle we can generate paths
representing any combination of hashing and sorting for the grouping
sets. But that would be overkill (and grow to a huge number of paths if
we have more than a handful of grouping sets). So I think the existing
planner logic for grouping sets is fine for now. We might come up with
a better approach later.

But that created a testing problem, because if the planner estimates
correctly, no hashed grouping sets will spill, and the spilling code
won't be exercised. This GUC makes the planner disregard which grouping
sets' hash tables will fit, making it much easier to exercise the
spilling code. Is there a better way I should be testing this code
path?

Regards,
Jeff Davis

In response to

Re: Default setting for enable_hashagg_disk at 2020-04-09 19:26:36 from Robert Haas

Responses

Re: Default setting for enable_hashagg_disk at 2020-06-10 01:20:13 from Melanie Plageman

Browse pgsql-docs by date

	From	Date	Subject
Next Message	Bruce Momjian	2020-04-09 20:57:16	Re: Documentation: 21.5. Default Roles
Previous Message	Robert Haas	2020-04-09 19:26:36	Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2020-04-09 20:15:07	Re: where should I stick that backup?
Previous Message	Andres Freund	2020-04-09 20:00:39	Re: Parallel copy