From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Default setting for enable_hashagg_disk |
Date: | 2020-06-10 16:40:59 |
Message-ID: | CAAKRu_ap3xVCkgMuVoof_vh3GM9HZtYjnv2uVPY93ZBfubt9LQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs pgsql-hackers |
On Tue, Jun 9, 2020 at 7:15 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Tue, Jun 09, 2020 at 06:20:13PM -0700, Melanie Plageman wrote:
> > On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> > > 2. enable_groupingsets_hash_disk (default false):
> > >
> > > This is about how we choose which grouping sets to hash and which to
> > > sort when generating mixed mode paths.
> > >
> > > Even before this patch, there are quite a few paths that could be
> > > generated. It tries to estimate the size of each grouping set's hash
> > > table, and then see how many it can fit in work_mem (knapsack), while
> > > also taking advantage of any path keys, etc.
> > >
> > > With Disk-based Hash Aggregation, in principle we can generate paths
> > > representing any combination of hashing and sorting for the grouping
> > > sets. But that would be overkill (and grow to a huge number of paths if
> > > we have more than a handful of grouping sets). So I think the existing
> > > planner logic for grouping sets is fine for now. We might come up with
> > > a better approach later.
> > >
> > > But that created a testing problem, because if the planner estimates
> > > correctly, no hashed grouping sets will spill, and the spilling code
> > > won't be exercised. This GUC makes the planner disregard which grouping
> > > sets' hash tables will fit, making it much easier to exercise the
> > > spilling code. Is there a better way I should be testing this code
> > > path?
> >
> > So, I was catching up on email and noticed the last email in this
> > thread.
> >
> > I think I am not fully understanding what enable_groupingsets_hash_disk
> > does. Is it only for testing?
>
> If so, it should be in category: "Developer Options".
>
> > Using the tests you added to src/test/regress/sql/groupingsets.sql, I
> > did get a plan that looks like hashagg is spilling to disk (goes through
> > hashagg_spill_tuple() code path and has number of batches reported in
> > Explain) in a MixedAgg plan for a grouping sets query even with
> > enable_groupingsets_hash_disk set to false.
>
> > I'm not sure if this is more what you were looking for--or maybe I am
> > misunderstanding the guc.
>
> The behavior of the GUC is inconsistent with the other GUCs, which is
> confusing. See also Robert's comments in this thread.
> https://www.postgresql.org/message-id/20200407223900.GT2228%40telsasoft.com
>
> The old (pre-13) behavior was:
> - work_mem is the amount of RAM to which each query node tries to
> constrain
> itself, and the planner will reject a plan if it's expected to exceed
> that.
> ...But a chosen plan might exceed work_mem anyway.
>
> The new behavior in v13 seems to be:
> - HashAgg now respects work_mem, but instead enable*hash_disk are
> opportunisitic. A node which is *expected* to spill to disk will be
> rejected.
> ...But at execution time, a node which exceeds work_mem will be spilled.
>
> If someone sees a plan which spills to disk and wants to improve
> performance by
> avoid spilling, they might SET enable_hashagg_disk=off, which might do what
> they want (if the plan is rejected at plan time), or it might not, which I
> think will be a surprise every time.
>
>
But I thought that the enable_groupingsets_hash_disk GUC allows us to
test the following scenario:
The following is true:
- planner thinks grouping sets' hashtables table would fit in memory
(spilling is *not* expected)
- user is okay with spilling
- some grouping keys happen to be sortable and some hashable
The following happens:
- Planner generates some HashAgg grouping sets paths
- A MixedAgg plan is created
- During execution of the MixedAgg plan, one or more grouping sets'
hashtables would exceed work_mem, so the executor spills those tuples
to disk instead of exceeding work_mem
Especially given the code and comment:
/*
* If we have sortable columns to work with (gd->rollups is non-empty)
* and enable_groupingsets_hash_disk is disabled, don't generate
* hash-based paths that will exceed work_mem.
*/
if (!enable_groupingsets_hash_disk &&
hashsize > work_mem * 1024L && gd->rollups)
return; /* nope, won't fit */
If this is the scenario that the GUC is designed to test, it seems like
you could exercise it without the enable_groupingsets_hash_disk GUC by
lying about the stats, no?
--
Melanie Plageman
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2020-06-10 17:39:08 | Re: Default setting for enable_hashagg_disk |
Previous Message | Alvaro Herrera | 2020-06-10 15:53:09 | Re: some charts or graphs of possible permissions would be nice |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-06-10 17:37:07 | Re: Fast DSM segments |
Previous Message | Stephen Frost | 2020-06-10 16:21:40 | Re: Recording test runtimes with the buildfarm |