Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2020-02-22 19:59:59
Message-ID: ee3784f2d54a37a8c93719704f57a8aa50a98d77.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2020-02-19 at 20:16 +0100, Tomas Vondra wrote:
> 5) Assert(nbuckets > 0);

...

> 6) Another thing that occurred to me was what happens to grouping
> sets,
> which we can't spill to disk. So I did this:

...

> which fails with segfault at execution time:

The biggest problem was that my grouping sets test was not testing
multiple hash tables spilling, so a couple bugs crept in. I fixed them,
thank you.

To fix the tests, I also had to fix the GUCs and the way the planner
uses them with my patch. In master, grouping sets are planned by
generating a path that tries to do as many grouping sets with hashing
as possible (limited by work_mem). But with my patch, the notion of
fitting hash tables in work_mem is not necessarily important. If we
ignore work_mem during path generation entirely (and only consider it
during costing and execution), it will change quite a few plans and
undermine the concept of mixed aggregates entirely. That may be a good
thing to do eventually as a simplification, but for now it seems like
too much, so I am preserving the notion of trying to fit hash tables in
work_mem to create mixed aggregates.

But that created the testing problem: I need a reliable way to get
grouping sets with several hash tables in memory that are all spilling,
but the planner is trying to avoid exactly that. So, I am introducing a
new GUC called enable_groupingsets_hash_disk (better name suggestions
welcome), defaulting it to "off" (and turned on during the test).

Additionally, I removed the other GUCs I introduced in earlier versions
of this patch. They were basically designed around the idea to revert
back to the previous hash aggregation behavior if desired (by setting
enable_hashagg_spill=false and hashagg_mem_overflow=true). That makes
some sense, but that was already covered pretty well by existing GUCs.
If you want to use HashAgg without spilling, just set work_mem higher;
and if you want to avoid the planner from choosing HashAgg at all, you
set enable_hashagg=false. So I just got rid of enable_hashagg_spill and
hashagg_mem_overflow.

I didn't forget about your explain-related suggestions. I'll address
them in the next patch.

Regards,
Jeff Davis

Attachment Content-Type Size
hashagg-20200222.patch text/x-patch 105.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2020-02-22 20:08:45 Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform
Previous Message Jeff Davis 2020-02-22 19:02:16 Re: Memory-Bounded Hash Aggregation