Re: explain HashAggregate to report bucket and memory stats

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: explain HashAggregate to report bucket and memory stats
Date: 2020-03-09 17:33:55
Message-ID: 20200309173355.bdnj33ffm7xuwquy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-06 15:33:10 -0600, Justin Pryzby wrote:
> On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > > + ExplainIndentText(es);
> > > + appendStringInfo(es->str,
> > > + "Buckets: %ld (originally %ld)",
> > > + inst->nbuckets,
> > > + inst->nbuckets_original);
> >
> > I'm not sure I like the alternative output formats here. All the other
> > fields are separated with a comma, but the original size is in
> > parens. I'd probably just format it as "Buckets: %lld " and then add
> > ", Original Buckets: %lld" when differing.
>
> It's done that way for consistency with hashJoin in show_hash_info().

Fair. I don't like it, but it's not this patch's fault.

> > > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > > index f457b5b150..b173b32cab 100644
> > > --- a/src/test/regress/expected/aggregates.out
> > > +++ b/src/test/regress/expected/aggregates.out
> > > @@ -517,10 +517,11 @@ order by 1, 2;
> > > -> HashAggregate
> > > Output: s2.s2, sum((s1.s1 + s2.s2))
> > > Group Key: s2.s2
> > > + Buckets: 4
> > > -> Function Scan on pg_catalog.generate_series s2
> > > Output: s2.s2
> > > Function Call: generate_series(1, 3)
> > > -(14 rows)
> > > +(15 rows)
> >
> > These tests probably won't be portable. The number of hash buckets
> > calculated will e.g. depend onthe size of the contained elements. And
> > that'll e.g. will depend on whether pointers are 4 or 8 bytes.
>
> I was aware and afraid of that. Previously, I added this output only to
> "explain analyze", and (as an quick, interim implementation) changed various
> tests to use analyze, and memory only shown in "verbose" mode. But as Tomas
> pointed out, that's consistent with what's done elsewhere.
>
> So is the solution to show stats only during explain ANALYZE ?
>
> Or ... I have a patch to create a new explain(MACHINE) option to allow more
> stable output, by avoiding Memory/Disk. That doesn't attempt to make all
> "explain analyze" output stable - there's other issues, I think mostly related
> to parallel workers (see 4ea03f3f, 13e8b2ee). But does allow retiring
> explain_sq_limit and explain_parallel_sort_stats. I'm including my patch to
> show what I mean, but I didn't enable it for hashtable "Buckets:". I guess in
> either case, the tests shouldn't be included.

Yea, there's been recent discussion about an argument like that. See
e.g.
https://www.postgresql.org/message-id/18494.1580079189%40sss.pgh.pa.us

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-09 17:37:03 Re: Fastpath while arranging the changes in LSN order in logical decoding
Previous Message Robert Haas 2020-03-09 17:29:47 Re: DROP and ddl_command_end.