Re: explain HashAggregate to report bucket and memory stats

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: explain HashAggregate to report bucket and memory stats
Date: 2020-03-06 17:58:59
Message-ID: 20200306175859.d56ohskarwldyrrw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-01 09:45:40 -0600, Justin Pryzby wrote:
> +/*
> + * Show hash bucket stats and (optionally) memory.
> + */
> +static void
> +show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es)
> +{
> + long spacePeakKb_tuples = (inst->space_peak_tuples + 1023) / 1024,
> + spacePeakKb_hash = (inst->space_peak_hash + 1023) / 1024;

Let's not add further uses of long. It's terrible because it has
different widths on 64bit windows and everything else. Specify the
widths explicitly, or use something like size_t.

> + if (es->format != EXPLAIN_FORMAT_TEXT)
> + {
> + ExplainPropertyInteger("Hash Buckets", NULL,
> + inst->nbuckets, es);
> + ExplainPropertyInteger("Original Hash Buckets", NULL,
> + inst->nbuckets_original, es);
> + ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
> + spacePeakKb_hash, es);
> + ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",
> + spacePeakKb_tuples, es);

And then you're passing the long to ExplainPropertyInteger which accepts
a int64, making the use of long above particularly suspicious.

I wonder if it would make sense to add a ExplainPropertyBytes(), that
would do the rounding etc automatically. It could then also switch units
as appropriate.

> + }
> + else if (!inst->nbuckets)
> + ; /* Do nothing */
> + else
> + {
> + if (inst->nbuckets_original != inst->nbuckets)
> + {
> + ExplainIndentText(es);
> + appendStringInfo(es->str,
> + "Buckets: %ld (originally %ld)",
> + inst->nbuckets,
> + inst->nbuckets_original);
> + }
> + else
> + {
> + ExplainIndentText(es);
> + appendStringInfo(es->str,
> + "Buckets: %ld",
> + inst->nbuckets);
> + }
> +
> + if (es->analyze)
> + appendStringInfo(es->str,
> + " Memory Usage: hashtable: %ldkB, tuples: %ldkB",
> + spacePeakKb_hash, spacePeakKb_tuples);
> + appendStringInfoChar(es->str, '\n');

I'm not sure I like the alternative output formats here. All the other
fields are separated with a comma, but the original size is in
parens. I'd probably just format it as "Buckets: %lld " and then add
", Original Buckets: %lld" when differing.

Also, %ld is problematic, because it's only 32bit wide on some platforms
(including 64bit windows), but 64bit on others. The easiest way to deal
with that is to use %lld and cast the argument to long long - yes that's
a sad workaround.

> +/* Update instrumentation stats */
> +void
> +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> +{
> + hashtable->instrument.nbuckets = hashtable->hashtab->size;
> + if (initial)
> + {
> + hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> + hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> + sizeof(TupleHashEntryData);
> + hashtable->instrument.space_peak_tuples = 0;
> + }
> + else
> + {
> +#define maxself(a,b) a=Max(a,b)
> + /* hashtable->entrysize includes additionalsize */
> + maxself(hashtable->instrument.space_peak_hash,
> + hashtable->hashtab->size * sizeof(TupleHashEntryData));
> + maxself(hashtable->instrument.space_peak_tuples,
> + hashtable->hashtab->members * hashtable->entrysize);
> +#undef maxself
> + }
> +}

Not a fan of this macro.

I'm also not sure I understand what you're trying to do here?

> diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> index f457b5b150..b173b32cab 100644
> --- a/src/test/regress/expected/aggregates.out
> +++ b/src/test/regress/expected/aggregates.out
> @@ -517,10 +517,11 @@ order by 1, 2;
> -> HashAggregate
> Output: s2.s2, sum((s1.s1 + s2.s2))
> Group Key: s2.s2
> + Buckets: 4
> -> Function Scan on pg_catalog.generate_series s2
> Output: s2.s2
> Function Call: generate_series(1, 3)
> -(14 rows)
> +(15 rows)

These tests probably won't be portable. The number of hash buckets
calculated will e.g. depend onthe size of the contained elements. And
that'll e.g. will depend on whether pointers are 4 or 8 bytes.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Bychik 2020-03-06 17:59:31 Re: WAL usage calculation patch
Previous Message Magnus Hagander 2020-03-06 17:54:09 Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side