From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Subject: | Re: explain HashAggregate to report bucket and memory stats |
Date: | 2020-03-06 17:58:59 |
Message-ID: | 20200306175859.d56ohskarwldyrrw@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-03-01 09:45:40 -0600, Justin Pryzby wrote:
> +/*
> + * Show hash bucket stats and (optionally) memory.
> + */
> +static void
> +show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es)
> +{
> + long spacePeakKb_tuples = (inst->space_peak_tuples + 1023) / 1024,
> + spacePeakKb_hash = (inst->space_peak_hash + 1023) / 1024;
Let's not add further uses of long. It's terrible because it has
different widths on 64bit windows and everything else. Specify the
widths explicitly, or use something like size_t.
> + if (es->format != EXPLAIN_FORMAT_TEXT)
> + {
> + ExplainPropertyInteger("Hash Buckets", NULL,
> + inst->nbuckets, es);
> + ExplainPropertyInteger("Original Hash Buckets", NULL,
> + inst->nbuckets_original, es);
> + ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
> + spacePeakKb_hash, es);
> + ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",
> + spacePeakKb_tuples, es);
And then you're passing the long to ExplainPropertyInteger which accepts
a int64, making the use of long above particularly suspicious.
I wonder if it would make sense to add a ExplainPropertyBytes(), that
would do the rounding etc automatically. It could then also switch units
as appropriate.
> + }
> + else if (!inst->nbuckets)
> + ; /* Do nothing */
> + else
> + {
> + if (inst->nbuckets_original != inst->nbuckets)
> + {
> + ExplainIndentText(es);
> + appendStringInfo(es->str,
> + "Buckets: %ld (originally %ld)",
> + inst->nbuckets,
> + inst->nbuckets_original);
> + }
> + else
> + {
> + ExplainIndentText(es);
> + appendStringInfo(es->str,
> + "Buckets: %ld",
> + inst->nbuckets);
> + }
> +
> + if (es->analyze)
> + appendStringInfo(es->str,
> + " Memory Usage: hashtable: %ldkB, tuples: %ldkB",
> + spacePeakKb_hash, spacePeakKb_tuples);
> + appendStringInfoChar(es->str, '\n');
I'm not sure I like the alternative output formats here. All the other
fields are separated with a comma, but the original size is in
parens. I'd probably just format it as "Buckets: %lld " and then add
", Original Buckets: %lld" when differing.
Also, %ld is problematic, because it's only 32bit wide on some platforms
(including 64bit windows), but 64bit on others. The easiest way to deal
with that is to use %lld and cast the argument to long long - yes that's
a sad workaround.
> +/* Update instrumentation stats */
> +void
> +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> +{
> + hashtable->instrument.nbuckets = hashtable->hashtab->size;
> + if (initial)
> + {
> + hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> + hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> + sizeof(TupleHashEntryData);
> + hashtable->instrument.space_peak_tuples = 0;
> + }
> + else
> + {
> +#define maxself(a,b) a=Max(a,b)
> + /* hashtable->entrysize includes additionalsize */
> + maxself(hashtable->instrument.space_peak_hash,
> + hashtable->hashtab->size * sizeof(TupleHashEntryData));
> + maxself(hashtable->instrument.space_peak_tuples,
> + hashtable->hashtab->members * hashtable->entrysize);
> +#undef maxself
> + }
> +}
Not a fan of this macro.
I'm also not sure I understand what you're trying to do here?
> diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> index f457b5b150..b173b32cab 100644
> --- a/src/test/regress/expected/aggregates.out
> +++ b/src/test/regress/expected/aggregates.out
> @@ -517,10 +517,11 @@ order by 1, 2;
> -> HashAggregate
> Output: s2.s2, sum((s1.s1 + s2.s2))
> Group Key: s2.s2
> + Buckets: 4
> -> Function Scan on pg_catalog.generate_series s2
> Output: s2.s2
> Function Call: generate_series(1, 3)
> -(14 rows)
> +(15 rows)
These tests probably won't be portable. The number of hash buckets
calculated will e.g. depend onthe size of the contained elements. And
that'll e.g. will depend on whether pointers are 4 or 8 bytes.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Bychik | 2020-03-06 17:59:31 | Re: WAL usage calculation patch |
Previous Message | Magnus Hagander | 2020-03-06 17:54:09 | Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side |