Re: Memory-Bounded Hash Aggregation

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2020-03-12 21:01:46
Message-ID: 20200312210146.GG29065@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 11, 2020 at 11:55:35PM -0700, Jeff Davis wrote:
> * tweaked EXPLAIN output some more
> Unless I (or someone else) finds something significant, this is close
> to commit.

Thanks for working on this ; I finally made a pass over the patch.

+++ b/doc/src/sgml/config.sgml
+ <term><varname>enable_groupingsets_hash_disk</varname> (<type>boolean</type>)
+ Enables or disables the query planner's use of hashed aggregation for
+ grouping sets when the size of the hash tables is expected to exceed
+ <varname>work_mem</varname>. See <xref
+ linkend="queries-grouping-sets"/>. Note that this setting only
+ affects the chosen plan; execution time may still require using
+ disk-based hash aggregation. ...
...
+ <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>)
+ ... This only affects the planner choice;
+ execution time may still require using disk-based hash
+ aggregation. The default is <literal>on</literal>.

I don't understand what's meant by "the chosen plan".
Should it say, "at execution ..." instead of "execution time" ?

+ Enables or disables the query planner's use of hashed aggregation plan
+ types when the memory usage is expected to exceed

Either remove "plan types" for consistency with enable_groupingsets_hash_disk,
Or add it there. Maybe it should say "when the memory usage would OTHERWISE BE
expected to exceed.."

+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+ Agg *agg = (Agg *)aggstate->ss.ps.plan;
+ long memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;

I see this partially duplicates my patch [0] to show memory stats for (at
Andres' suggestion) all of execGrouping.c. Perhaps you'd consider naming the
function something more generic in case my patch progresses ? I'm using:
|show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es);

Mine also shows:
|ExplainPropertyInteger("Original Hash Buckets", NULL,
|ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
|ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",

[0] https://www.postgresql.org/message-id/20200306213310.GM684%40telsasoft.com

You added hash_mem_peak and hash_batches_used to struct AggState.
In my 0001 patch, I added instrumentation to struct TupleHashTable, and in my
0005 patch I move it into AggStatePerHashData and other State nodes.

+ if (from_tape)
+ partition_mem += HASHAGG_READ_BUFFER_SIZE;
+ partition_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;

=> That looks wrong ; should say += ?

+ gettext_noop("Enables the planner's use of hashed aggregation plans that are expected to exceed work_mem."),

should say:
"when the memory usage is otherwise be expected to exceed.."

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-03-12 21:53:11 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Daniel Gustafsson 2020-03-12 21:00:33 Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library