Re: Eager aggregation, take 3

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Eager aggregation, take 3
Date: 2025-08-15 01:41:18
Message-ID: CAMbWs48sHEbQYZ5PPQdJKH6Vi4Hr-XYXkC6EObFhQORMdZNk9w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 15, 2025 at 4:22 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
> Debugging this query shows that all if conditions on
> setup_eager_aggregation() returns false and create_agg_clause_infos()
> and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
> is also being set to true so I would expect to see Finalize and Partial
> agg nodes, is this correct or am I missing something here?

Well, just because eager aggregation *can* be applied does not mean
that it *will* be; it depends on whether it produces a lower-cost
execution plan. This transformation is cost-based, so it's not the
right mindset to assume that it will always be applied when possible.

In your case, with the filter "t2.c = 5", the row estimate for t2 is
just 1 after the filter has been applied. The planner decides that
adding a partial aggregation on top of such a small result set doesn't
offer much benefit, which seems reasonable to me.

-> Hash (cost=18.50..18.50 rows=1 width=12)
(actual time=0.864..0.865 rows=1.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=1 width=12)
(actual time=0.060..0.851
rows=1.00 loops=1)
Filter: (c = '5'::double precision)
Rows Removed by Filter: 999

With the filter "t2.c > 5", the row estimate for t2 is 995 after
filtering. A partial aggregation can reduce that to 10 rows, so the
planner decides that adding a partial aggregation is beneficial -- and
does so. That also seems reasonable to me.

-> Partial HashAggregate (cost=23.48..23.58 rows=10 width=36)
(actual time=2.427..2.438 rows=10.00 loops=1)
Group Key: t2.b
Batches: 1 Memory Usage: 32kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=995 width=12)
(actual time=0.053..0.989
rows=995.00 loops=1)
Filter: (c > '5'::double precision)
Rows Removed by Filter: 5

> Is this behavior correct? If it's correct, would be possible to check
> this limitation on setup_eager_aggregation() and maybe skip all the
> other work?

Hmm, I wouldn't consider this a limitation; it's just the result of
the planner's cost-based tournament for path selection.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-08-15 01:44:29 Re: Make pgoutput documentation easier to find
Previous Message Peter Smith 2025-08-15 00:53:10 Re: Skipping schema changes in publication