Quick Links

Re: Eager aggregation, take 3

From:	Tender Wang <tndrwang(at)gmail(dot)com>
To:	Radim Marek <radim(at)boringsql(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eager aggregation, take 3
Date:	2026-05-31 11:28:37
Message-ID:	CAHewXNkdhBrQR3mRmLw6FThgXd-tD2LmgfLreb+OeiyXmO2PPQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Radim Marek <radim(at)boringsql(dot)com> 于2026年5月29日周五 23:55写道：
>
> Hey Richard,
>
> I might be out of my depth here - but while testing RegreSQL as correctness/performance harness on PostgreSQL it picked up a problem with the wrong-results case during eager aggregation.
>
> It reproduces on current HEAD (commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now) with enable_eager_aggregate enabled.
>
> My testing environment
> - Linux aarch64, gcc 12 (Debian)
> - macOS arm64, Apple clang 21
> (PostgreSQL 19devel on aarch64-apple-darwin25.5.0)
>
> == How to reproduce
>
> CREATE TEMP TABLE c(id int, country text);
> CREATE TEMP TABLE o(customer_id int);
> INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
> INSERT INTO o VALUES (1),(3); -- only customers 1 and 3 have a row in o
>
> SELECT c.country, count(*) AS n
> FROM c
> WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
> GROUP BY c.country
> ORDER BY c.country;
>
> Expected results (everywhere except master)
>
> country | n
> ---------+---
> DE | 2
> US | 1
> (2 rows)
>
> The actual result with enable_eager_aggregate = on (default)
>
> country | n
> ---------+---
> DE | 0
> US | 0
> (2 rows)
>
> With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1), as it is on PG18.
>
> Query Plan
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197 rows=2.00 loops=1)
> Sort Key: c.country
> Sort Method: quicksort Memory: 25kB
> Buffers: local hit=2
> -> Finalize HashAggregate (cost=98.55..100.55 rows=200 width=40) (actual time=0.183..0.186 rows=2.00 loops=1)
> Group Key: c.country
> Batches: 1 Memory Usage: 32kB
> Buffers: local hit=2
> -> Hash Anti Join (cost=52.75..95.37 rows=635 width=40) (actual time=0.177..0.179 rows=3.00 loops=1)
> Hash Cond: (c.id = o.customer_id)
> Buffers: local hit=2
> -> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.025 rows=5.00 loops=1)
> Buffers: local hit=1
> -> Hash (cost=50.25..50.25 rows=200 width=12) (actual time=0.145..0.146 rows=2.00 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 9kB
> Buffers: local hit=1
> -> Partial HashAggregate (cost=48.25..50.25 rows=200 width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
> Group Key: o.customer_id
> Batches: 1 Memory Usage: 32kB
> Buffers: local hit=1
> -> Seq Scan on o (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
> Buffers: local hit=1
> Planning Time: 0.294 ms
> Execution Time: 0.255 ms
> (24 rows)
>
> If this is already known or in progress, apologies for the noise.
Thanks for the report. This is a bug.
When we use eager_agg, it can reduce many tuples before doing a join
on the partial agg side.
After partial agg, when we are doing a join, the matched rows will be
significantly reduced.
This is also the effect we want to achieve from eager_agg.

But we should be careful about anti-join. Because we will ignore the
matched row. The aggregate of unmatched rows seems wrong.
And I can get the wrong results from the semi-join, too.
For example:
postgres=# CREATE TEMP TABLE c(id int, country text);
CREATE TEMP TABLE o(customer_id int);
INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
INSERT INTO o VALUES (1),(3);
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 2
postgres=# insert into o values (1);
INSERT 0 1
-- correct result
postgres=# SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
country | n
---------+---
DE | 1
US | 1
(2 rows)

I do some hacks that make the cost of the path created in
make_grouped_join_rel() very small.
So we can get a partial agg plan, as follow:

postgres=# explain SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
QUERY PLAN
-----------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=31.56..38.32 rows=200 width=40)
Group Key: c.country
-> Sort (cost=31.56..33.15 rows=635 width=40)
Sort Key: c.country
-> Hash Semi Join (cost=1.00..2.00 rows=635 width=40)
Hash Cond: (c.id = o.customer_id)
-> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36)
-> Hash (cost=200.91..200.91 rows=200 width=12)
-> Partial GroupAggregate (cost=179.78..200.91
rows=200 width=12)
Group Key: o.customer_id
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: o.customer_id
-> Seq Scan on o (cost=0.00..35.50
rows=2550 width=4)
(13 rows)

postgres=# SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
country | n
---------+---
DE | 1
US | 2
(2 rows)

You can see that the count(us) has 2. Because partial agg
pre-aggregates the results for country =1.
However, for the semantics of semi-join, it returns once a match is found.

I haven't thought about it too deeply yet. Maybe we can do something
in the make_grouped_join_rel().
...
if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
return;
...
The fixes above can temporarily resolve these issues. But it seems too strict.

--
Thanks,
Tender Wang

In response to

Re: Eager aggregation, take 3 at 2026-05-29 15:55:01 from Radim Marek

Responses

Re: Eager aggregation, take 3 at 2026-06-01 07:19:03 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	ZizhuanLiu X-MAN	2026-05-31 11:39:06	Re: Make transformAExprIn() return a flattened bool expression directly
Previous Message	Alexander Lakhin	2026-05-31 11:00:01	Re: Exit walsender before confirming remote flush in logical replication