Re: explain HashAggregate to report bucket and memory stats

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: explain HashAggregate to report bucket and memory stats
Date: 2020-02-22 23:00:58
Message-ID: 20200222230058.GT31889@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 22, 2020 at 10:53:35PM +0100, Tomas Vondra wrote:
> I've started looking at this patch, because I've been long missing the

Thanks for looking

I have brief, initial comments before I revisit the patch.

> 3) Almost all executor nodes that are modified to include this new
> instrumentation struct also include TupleHashTable, and the data are
> essentially about the hash table. So my question is why not to include
> this into TupleHashTable - that would mean we don't need to modify any
> executor nodes, and it'd probably simplify code in explain.c too because
> we could simply pass the hashtable.

I considered this. From 0004 commit message:

| Also, if instrumentation were implemented in simplehash.h, I think every
| insertion or deletion would need to check ->members and ->size (which isn't
| necessary for Agg, but is necessary in the general case, and specifically for
| tidbitmap, since it actually DELETEs hashtable entries). Or else simplehash
| would need a new function like UpdateTupleHashStats, which the higher level nodes
| would need to call after filling the hashtable or before deleting tuples, which
| seems to defeat the purpose of implementing stats at a lower layer.

> 4) The one exception to (3) is BitmapHeapScanState, which does include
> TIDBitmap and not TupleHashTable. And then we have tbm_instrumentation
> which "fakes" the data based on the pagetable. Maybe this is a sign that
> TIDBitmap needs a slightly different struct?

Hm, I'd say that it "collects" the data that's not immediately present, not
fake it. But maybe I did it poorly. Also, maybe TIDBitmap shouldn't be
included in the patch..

> Also, I'm not sure why we
> actually need tbm_instrumentation()? It just copies the instrumentation
> data from TIDBitmap into the node level, but why couldn't we just look
> at the instrumentation data in TIDBitmap directly?

See 0004 commit message:

| TIDBitmap is a private structure, so add an accessor function to return its
| instrumentation, and duplicate instrumentation struct in BitmapHeapState.

Also, I don't know what anyone else thinks, but I think 0005 is a throwaway
commit. It's implemented more nicely in execGrouping.c.

> But it's definitely strange that we only print memory info in verbose mode -
> IMHO it's much more useful info than the number of buckets etc.

Because I wanted to be able to put "explain analyze" into regression tests
(which can show: "Buckets: 4 (originally 2)"). But cannot get stable output
for any plan which uses Sort, without hacks like explain_sq_limit and
explain_parallel_sort_stats.

Actually, I wish there were a way to control Sort nodes' Memory/Disk output,
too. I'm sure most of regression tests were meant to be run as explain(analyze NO),
but it'd be much better if analyze YES were reasonably easy in the general
case that might include Sort. If someone seconds that, I will start a separate
thread.

--
Justin Pryzby

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-02-23 00:42:16 Re: Memory-Bounded Hash Aggregation
Previous Message Tomas Vondra 2020-02-22 21:53:35 Re: explain HashAggregate to report bucket and memory stats