From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Default setting for enable_hashagg_disk |
Date: | 2020-06-10 01:20:13 |
Message-ID: | CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs pgsql-hackers |
On Thu, Apr 9, 2020 at 1:02 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> 2. enable_groupingsets_hash_disk (default false):
>
> This is about how we choose which grouping sets to hash and which to
> sort when generating mixed mode paths.
>
> Even before this patch, there are quite a few paths that could be
> generated. It tries to estimate the size of each grouping set's hash
> table, and then see how many it can fit in work_mem (knapsack), while
> also taking advantage of any path keys, etc.
>
> With Disk-based Hash Aggregation, in principle we can generate paths
> representing any combination of hashing and sorting for the grouping
> sets. But that would be overkill (and grow to a huge number of paths if
> we have more than a handful of grouping sets). So I think the existing
> planner logic for grouping sets is fine for now. We might come up with
> a better approach later.
>
> But that created a testing problem, because if the planner estimates
> correctly, no hashed grouping sets will spill, and the spilling code
> won't be exercised. This GUC makes the planner disregard which grouping
> sets' hash tables will fit, making it much easier to exercise the
> spilling code. Is there a better way I should be testing this code
> path?
>
>
So, I was catching up on email and noticed the last email in this
thread.
I think I am not fully understanding what enable_groupingsets_hash_disk
does. Is it only for testing?
Using the tests you added to src/test/regress/sql/groupingsets.sql, I
did get a plan that looks like hashagg is spilling to disk (goes through
hashagg_spill_tuple() code path and has number of batches reported in
Explain) in a MixedAgg plan for a grouping sets query even with
enable_groupingsets_hash_disk set to false. You don't have the exact
query I tried (below) in the test suite, but it is basically what is
already there, so I must be missing something.
set enable_hashagg_disk = true;
SET enable_groupingsets_hash_disk = false;
SET work_mem='64kB';
set enable_hashagg = true;
set jit_above_cost = 0;
drop table if exists gs_hash_1;
create table gs_hash_1 as
select g1000, g100, g10, sum(g::numeric), count(*), max(g::text) from
(select g%1000 as g1000, g%100 as g100, g%10 as g10, g
from generate_series(0,199999) g) s
group by cube (g1000,g100,g10);
explain (analyze, costs off, timing off)
select g1000, g100, g10
from gs_hash_1 group by cube (g1000,g100,g10);
QUERY PLAN
--------------------------------------------------------------
MixedAggregate (actual rows=9648 loops=1)
Hash Key: g10
Hash Key: g10, g1000
Hash Key: g100
Hash Key: g100, g10
Group Key: g1000, g100, g10
Group Key: g1000, g100
Group Key: g1000
Group Key: ()
Peak Memory Usage: 233 kB
Disk Usage: 1600 kB
HashAgg Batches: 2333
-> Sort (actual rows=4211 loops=1)
Sort Key: g1000, g100, g10
Sort Method: external merge Disk: 384kB
-> Seq Scan on gs_hash_1 (actual rows=4211 loops=1)
Anyway, when I throw in the stats trick that is used in join_hash.sql:
alter table gs_hash_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname = 'gs_hash_1';
I get a MixedAgg plan that doesn't have any Sort below and uses much
more disk.
QUERY PLAN
----------------------------------------------------------
MixedAggregate (actual rows=4211 loops=1)
Hash Key: g1000, g100, g10
Hash Key: g1000, g100
Hash Key: g1000
Hash Key: g100, g10
Hash Key: g100
Hash Key: g10, g1000
Hash Key: g10
Group Key: ()
Peak Memory Usage: 405 kB
Disk Usage: 59712 kB
HashAgg Batches: 4209
-> Seq Scan on gs_hash_1 (actual rows=200000 loops=1)
I'm not sure if this is more what you were looking for--or maybe I am
misunderstanding the guc.
--
Melanie Plageman
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2020-06-10 02:15:44 | Re: Default setting for enable_hashagg_disk |
Previous Message | PG Doc comments form | 2020-06-09 21:35:41 | Signed-ness of ints is unclear in FE-BE protocol docs |
From | Date | Subject | |
---|---|---|---|
Next Message | torikoshia | 2020-06-10 01:50:58 | Re: Is it useful to record whether plans are generic or custom? |
Previous Message | Andres Freund | 2020-06-10 00:12:54 | Re: elog(DEBUG2 in SpinLocked section. |