Re: explain HashAggregate to report bucket and memory stats

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: explain HashAggregate to report bucket and memory stats
Date: 2020-03-06 21:33:10
Message-ID: 20200306213310.GM684@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > + ExplainIndentText(es);
> > + appendStringInfo(es->str,
> > + "Buckets: %ld (originally %ld)",
> > + inst->nbuckets,
> > + inst->nbuckets_original);
>
> I'm not sure I like the alternative output formats here. All the other
> fields are separated with a comma, but the original size is in
> parens. I'd probably just format it as "Buckets: %lld " and then add
> ", Original Buckets: %lld" when differing.

It's done that way for consistency with hashJoin in show_hash_info().

> > +/* Update instrumentation stats */
> > +void
> > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> > +{
> > + hashtable->instrument.nbuckets = hashtable->hashtab->size;
> > + if (initial)
> > + {
> > + hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> > + hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> > + sizeof(TupleHashEntryData);
> > + hashtable->instrument.space_peak_tuples = 0;
> > + }
> > + else
> > + {
> > +#define maxself(a,b) a=Max(a,b)
> > + /* hashtable->entrysize includes additionalsize */
> > + maxself(hashtable->instrument.space_peak_hash,
> > + hashtable->hashtab->size * sizeof(TupleHashEntryData));
> > + maxself(hashtable->instrument.space_peak_tuples,
> > + hashtable->hashtab->members * hashtable->entrysize);
> > +#undef maxself
> > + }
> > +}
>
> Not a fan of this macro.
>
> I'm also not sure I understand what you're trying to do here?

I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations. If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.

> > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > index f457b5b150..b173b32cab 100644
> > --- a/src/test/regress/expected/aggregates.out
> > +++ b/src/test/regress/expected/aggregates.out
> > @@ -517,10 +517,11 @@ order by 1, 2;
> > -> HashAggregate
> > Output: s2.s2, sum((s1.s1 + s2.s2))
> > Group Key: s2.s2
> > + Buckets: 4
> > -> Function Scan on pg_catalog.generate_series s2
> > Output: s2.s2
> > Function Call: generate_series(1, 3)
> > -(14 rows)
> > +(15 rows)
>
> These tests probably won't be portable. The number of hash buckets
> calculated will e.g. depend onthe size of the contained elements. And
> that'll e.g. will depend on whether pointers are 4 or 8 bytes.

I was aware and afraid of that. Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode. But as Tomas
pointed out, that's consistent with what's done elsewhere.

So is the solution to show stats only during explain ANALYZE ?

Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk. That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee). But does allow retiring
explain_sq_limit and explain_parallel_sort_stats. I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:". I guess in
either case, the tests shouldn't be included.

--
Justin

Attachment Content-Type Size
v7-0001-explain-to-show-tuplehash-bucket-and-memory-stats.patch text/x-diff 88.7 KB
v7-0002-refactor-show_grouping_set_keys.patch text/x-diff 3.0 KB
v7-0003-Gross-hack-to-put-hash-stats-of-subplans-in-the-r.patch text/x-diff 5.4 KB
v7-0004-implement-hash-stats-for-bitmapHeapScan.patch text/x-diff 6.3 KB
v7-0005-Refactor-for-consistency-symmetry.patch text/x-diff 14.8 KB
v7-0006-TupleHashTable.entrysize-was-unused-except-for-in.patch text/x-diff 1.6 KB
v7-0007-Update-comment-obsolete-since-69c3936a.patch text/x-diff 891 bytes
v7-0008-Add-explain-MACHINE.patch text/x-diff 12.6 KB
v7-0009-Add-explain-REGRESS.patch text/x-diff 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dent John 2020-03-06 21:36:15 Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Previous Message Thomas Munro 2020-03-06 21:33:03 Re: effective_io_concurrency's steampunk spindle maths