Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.
Date: 2020-07-26 22:48:45
Message-ID: CAApHDvqTvD3bXBiC5cmuoATFYeP+RSkv5WTgfyuBb6JpwKmSoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 1 Jul 2020 at 18:46, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Tue, Jun 30, 2020, 7:04 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>>
>> Does anyone have any objections to that being changed?
>
> That's OK with me. By the way, I'm on vacation and will catch up on these HashAgg threads next week.

(Adding Justin as I know he's expressed interest in the EXPLAIN output
of HashAgg before)

I've written a patch to bring the HashAgg EXPLAIN ANALYZE output to be
more aligned to the Hash Join output.

Couple of things I observed about Hash Join EXPLAIN ANALYZE:
1. The number of batches starts at 1.
2. We always display the number of batches.
3. We write "Batches" for text format and "Hash Batches" for non-text formats.
4. We write "Memory Usage" for text format and "Peak Memory Usage" for
non-text formats.
5. "Batches" comes before memory usage.

Before this patch, HashAgg EXPLAIN ANALYZE output would:
1. Start the number of batches at 0.
2. Only display "Hash Batches" when batches > 0.
3. Used the words "HashAgg Batches" for text and non-text formats.
4. Used the words "Peak Memory Usage" for text and non-text formats.
5. "Hash Batches" was written after memory usage.

In the attached patch I've changed HashAgg to be aligned to Hash Join
on each of the points above.

e.g.

Before:

postgres=# explain analyze select c.relname,count(*) from pg_class c
inner join pg_Attribute a on c.oid = a.attrelid group by c.relname;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=138.37..142.23 rows=386 width=72) (actual
time=3.121..3.201 rows=427 loops=1)
Group Key: c.relname
Peak Memory Usage: 109kB
-> Hash Join (cost=21.68..124.10 rows=2855 width=64) (actual
time=0.298..1.768 rows=3153 loops=1)
Hash Cond: (a.attrelid = c.oid)
-> Seq Scan on pg_attribute a (cost=0.00..93.95 rows=3195
width=4) (actual time=0.011..0.353 rows=3153 loops=1)
-> Hash (cost=16.86..16.86 rows=386 width=68) (actual
time=0.279..0.279 rows=427 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 50kB
-> Seq Scan on pg_class c (cost=0.00..16.86 rows=386
width=68) (actual time=0.007..0.112 rows=427 loops=1)
Planning Time: 0.421 ms
Execution Time: 3.294 ms
(11 rows)

After:

postgres=# explain analyze select c.relname,count(*) from pg_class c
inner join pg_Attribute a on c.oid = a.attrelid group by c.relname;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=566.03..580.00 rows=1397 width=72) (actual
time=13.097..13.430 rows=1397 loops=1)
Group Key: c.relname
Batches: 1 Memory Usage: 321kB
-> Hash Join (cost=64.43..496.10 rows=13985 width=64) (actual
time=0.838..7.546 rows=13985 loops=1)
Hash Cond: (a.attrelid = c.oid)
-> Seq Scan on pg_attribute a (cost=0.00..394.85 rows=13985
width=4) (actual time=0.010..1.462 rows=13985 loops=1)
-> Hash (cost=46.97..46.97 rows=1397 width=68) (actual
time=0.820..0.821 rows=1397 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 153kB
-> Seq Scan on pg_class c (cost=0.00..46.97 rows=1397
width=68) (actual time=0.009..0.362 rows=1397 loops=1)
Planning Time: 0.440 ms
Execution Time: 13.634 ms
(11 rows)

(ignore the change in memory consumption. That was due to adding
records for testing)

Any objections to this change?

David

Attachment Content-Type Size
yet_more_hashagg_explain_fixes.patch application/octet-stream 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-07-26 23:17:38 Re: hashagg slowdown due to spill changes
Previous Message Thomas Munro 2020-07-26 22:17:36 Re: Parallel bitmap index scan