From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2025-08-15 01:41:18 |
Message-ID: | CAMbWs48sHEbQYZ5PPQdJKH6Vi4Hr-XYXkC6EObFhQORMdZNk9w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 15, 2025 at 4:22 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
> Debugging this query shows that all if conditions on
> setup_eager_aggregation() returns false and create_agg_clause_infos()
> and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
> is also being set to true so I would expect to see Finalize and Partial
> agg nodes, is this correct or am I missing something here?
Well, just because eager aggregation *can* be applied does not mean
that it *will* be; it depends on whether it produces a lower-cost
execution plan. This transformation is cost-based, so it's not the
right mindset to assume that it will always be applied when possible.
In your case, with the filter "t2.c = 5", the row estimate for t2 is
just 1 after the filter has been applied. The planner decides that
adding a partial aggregation on top of such a small result set doesn't
offer much benefit, which seems reasonable to me.
-> Hash (cost=18.50..18.50 rows=1 width=12)
(actual time=0.864..0.865 rows=1.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=1 width=12)
(actual time=0.060..0.851
rows=1.00 loops=1)
Filter: (c = '5'::double precision)
Rows Removed by Filter: 999
With the filter "t2.c > 5", the row estimate for t2 is 995 after
filtering. A partial aggregation can reduce that to 10 rows, so the
planner decides that adding a partial aggregation is beneficial -- and
does so. That also seems reasonable to me.
-> Partial HashAggregate (cost=23.48..23.58 rows=10 width=36)
(actual time=2.427..2.438 rows=10.00 loops=1)
Group Key: t2.b
Batches: 1 Memory Usage: 32kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=995 width=12)
(actual time=0.053..0.989
rows=995.00 loops=1)
Filter: (c > '5'::double precision)
Rows Removed by Filter: 5
> Is this behavior correct? If it's correct, would be possible to check
> this limitation on setup_eager_aggregation() and maybe skip all the
> other work?
Hmm, I wouldn't consider this a limitation; it's just the result of
the planner's cost-based tournament for path selection.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Chao Li | 2025-08-15 01:44:29 | Re: Make pgoutput documentation easier to find |
Previous Message | Peter Smith | 2025-08-15 00:53:10 | Re: Skipping schema changes in publication |