Re: Default setting for enable_hashagg_disk

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-11 00:48:17
Message-ID: CAAKRu_YsUV2jwR4WKv=UHoZDh3DJm-5BGfkh=_CZnChFBSYMwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Wed, Jun 10, 2020 at 10:39 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Tue, 2020-06-09 at 18:20 -0700, Melanie Plageman wrote:
> > So, I was catching up on email and noticed the last email in this
> > thread.
> >
> > I think I am not fully understanding what
> > enable_groupingsets_hash_disk
> > does. Is it only for testing?
>
> It's mostly for testing. I could imagine cases where it would be useful
> to force groupingsets to use the disk, but I mainly wanted the setting
> there for testing the grouping sets hash disk code path.
>
> > Using the tests you added to src/test/regress/sql/groupingsets.sql, I
> > did get a plan that looks like hashagg is spilling to disk (goes
> > through
>
> I had something that worked as a test for a while, but then when I
> tweaked the costing, it started using the Sort path (therefore not
> testing my grouping sets hash disk code at all) and a bug crept in. So
> I thought it would be best to have a more forceful knob.
>
> Perhaps I should just get rid of that GUC and use the stats trick?
>
>
I like the idea of doing the stats trick. For extra security, you could
throw in that other trick that is used in groupingsets.sql and make some
of the grouping columns unhashable and some unsortable so you know that
you will not pick only the Sort Path and do just a GroupAgg.

This slight modification of my previous example will probably yield
consistent results:

set enable_hashagg_disk = true;
SET enable_groupingsets_hash_disk = false;
SET work_mem='64kB';
SET enable_hashagg = true;
drop table if exists gs_hash_1;
create table gs_hash_1 as
select g%1000 as g1000, g%100 as g100, g%10 as g10, g,
g::text::xid as g_unsortable, g::bit(4) as g_unhashable
from generate_series(0,199999) g;
analyze gs_hash_1;

alter table gs_hash_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname = 'gs_hash_1';

explain (analyze, costs off, timing off)
select g1000, g100, g10
from gs_hash_1
group by grouping sets ((g1000,g100), (g10, g_unhashable), (g100,
g_unsortable));

QUERY PLAN
----------------------------------------------------------------
MixedAggregate (actual rows=201080 loops=1)
Hash Key: g100, g_unsortable
Group Key: g1000, g100
Sort Key: g10, g_unhashable
Group Key: g10, g_unhashable
Peak Memory Usage: 109 kB
Disk Usage: 13504 kB
HashAgg Batches: 10111
-> Sort (actual rows=200000 loops=1)
Sort Key: g1000, g100
Sort Method: external merge Disk: 9856kB
-> Seq Scan on gs_hash_1 (actual rows=200000 loops=1)

While we are on the topic of the tests, I was wondering if you had
considered making a user defined type that had a lot of padding so that
the tests could use fewer rows. I did this for adaptive hashjoin and it
helped me with iteration time.
I don't know if that would still be the kind of test you are looking for
since a user probably wouldn't have a couple hundred really fat
untoasted tuples, but, I just thought I would check if that would be
useful.

--
Melanie Plageman

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Jürgen Purtz 2020-06-11 08:19:50 Re: some charts or graphs of possible permissions would be nice
Previous Message Jeff Davis 2020-06-10 18:52:22 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-06-11 01:15:39 Re: hashagg slowdown due to spill changes
Previous Message Bossart, Nathan 2020-06-11 00:41:17 Add support for INDEX_CLEANUP and TRUNCATE to vacuumdb