Re: explain HashAggregate to report bucket and memory stats

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: explain HashAggregate to report bucket and memory stats
Date: 2020-02-16 00:02:20
Message-ID: 20200216000220.GF31889@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 03, 2020 at 06:53:01AM -0800, Andres Freund wrote:
> On 2020-01-03 10:19:26 -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > > node would report...anything.
>
> Yea, that'd be amazing. It probably should be something every
> execGrouping.c using node can opt into.

Do you think it should be implemented in execGrouping/TupleHashTableData (as I
did) ? I also did an experiment moving into the higher level nodes, but I
guess that's not actually desirable. There's currently different output from
tests between the implementation using execGrouping.c and the one outside it,
so there's at least an issue with grouping sets.

> > + hashtable->hinstrument.nbuckets_original = nbuckets;
> > + hashtable->hinstrument.nbuckets = nbuckets;
> > + hashtable->hinstrument.space_peak = entrysize * hashtable->hashtab->size;
>
> That's not actually an accurate accounting of memory, because for filled
> entries a lot of memory is used to store actual tuples:

Thanks - I think I finally understood this.

I updated some existing tests to show the new output. I imagine that's a
throwaway commit, and should eventually add new tests for each of these node
types under explain analyze.

I've been testing the various nodes like:

--heapscan:
DROP TABLE t; CREATE TABLE t (i int unique) WITH(autovacuum_enabled=off); INSERT INTO t SELECT generate_series(1,99999); SET enable_seqscan=off; SET parallel_tuple_cost=0; SET parallel_setup_cost=0; SET enable_indexonlyscan=off; explain analyze verbose SELECT * FROM t WHERE i BETWEEN 999 and 99999999;

--setop:
explain( analyze,verbose) SELECT * FROM generate_series(1,999) EXCEPT (SELECT NULL UNION ALL SELECT * FROM generate_series(1,99999));
Buckets: 2048 (originally 256) Memory Usage: hashtable: 48kB, tuples: 8Kb

--recursive union:
explain analyze verbose WITH RECURSIVE t(n) AS ( SELECT 'foo' UNION SELECT n || ' bar' FROM t WHERE length(n) < 9999) SELECT n, n IS OF (text) AS is_text FROM t;

--subplan
explain analyze verbose SELECT i FROM generate_series(1,999)i WHERE (i,i) NOT IN (SELECT 1,1 UNION ALL SELECT j,j FROM generate_series(1,99999)j);
Buckets: 262144 (originally 131072) Memory Usage: hashtable: 6144kB, tuples: 782Kb
explain analyze verbose select i FROM generate_series(1,999)i WHERE(1,i) NOT in (select i,null::int from t) ;

--Agg:
explain (analyze,verbose) SELECT A,COUNT(1) FROM generate_series(1,99999)a GROUP BY 1;
Buckets: 262144 (originally 256) Memory Usage: hashtable: 6144kB, tuples: 782Kb

explain (analyze, verbose) select i FROM generate_series(1,999)i WHERE(1,1) not in (select a,null from (SELECT generate_series(1,99999) a)x) ;

explain analyze verbose select * from (SELECT a FROM generate_series(1,99)a)v left join lateral (select v.a, four, ten, count(*) from (SELECT b four, 2 ten, b FROM generate_series(1,999)b)x group by cube(four,ten)) s on true order by v.a,four,ten;

--Grouping sets:
explain analyze verbose select unique1,
count(two), count(four), count(ten),
count(hundred), count(thousand), count(twothousand),
count(*)
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);

--
Justin

Attachment Content-Type Size
v2-0001-Run-some-existing-tests-with-explain-ANALYZE.patch text/x-diff 48.5 KB
v2-0002-explain-to-show-tuplehash-bucket-and-memory-stats.patch text/x-diff 27.7 KB
v2-0003-Gross-hack-to-put-hash-stats-of-subplans-in-the-r.patch text/x-diff 7.1 KB
v2-0004-implement-hash-stats-for-bitmapHeapScan.patch text/x-diff 5.5 KB
v2-0005-Refactor-for-consistency-symmetry.patch text/x-diff 14.7 KB
v2-0006-TupleHashTable.entrysize-was-unused-except-for-in.patch text/x-diff 1.6 KB
v2-0007-Update-comment-obsolete-since-69c3936a.patch text/x-diff 871 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2020-02-16 00:50:40 Re: New messages from Priscilla Ip
Previous Message Andrew Dunstan 2020-02-15 22:40:02 Re: Just for fun: Postgres 20?