Re: BUG #19517: Eager Aggregation produces wrong count(*) when pushed into RHS of Hash Semi Join

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: ihalatci(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Subject: Re: BUG #19517: Eager Aggregation produces wrong count(*) when pushed into RHS of Hash Semi Join
Date: 2026-06-11 01:03:22
Message-ID: CAHewXNm8cGaSTXuZgmJLhXYHsatqhAuOpo5fnCSQQog3aRhFrA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi all

Matheus Alcantara <matheusssilv97(at)gmail(dot)com> 于2026年6月11日周四 04:19写道:
>
> On Wed Jun 10, 2026 at 12:44 PM -03, PG Bug reporting form wrote:
> > Likely cause
> > ------------
> > Eager Aggregation appears to consider the RHS of a SEMI/ANTI join a legal
> > site to push a partial aggregate, but for SEMI joins each outer row matches
> > at most one inner row by the join's own semantics, so partial counts taken
> > on the inner side cannot be combined into a correct outer-side count(*).
> > Either eager aggregation must be disabled for the RHS of SEMI/ANTI joins,
> > or the partial state must be projected through the semi-join such that
> > each surviving outer row contributes exactly one partial.
> >
> > Two follow-up commits to eager aggregation landed on 2026-04-06
> > ("Fix volatile function evaluation in eager aggregation" and
> > "Fix collation handling for grouping keys in eager aggregation"), but
> > neither addresses semi-join correctness.
> >
>
> Wondering if commit ffeda04259b (Fix eager aggregation for semi/antijoin
> inner rels) is about fixing this issue. Can you please check it? I think
> that 19beta1 was created before the fix was applied on master.
>

Yes, this commit ffeda04259b has fixed this problem. But it wasn't
included in the 19beta1.
I tried on HEAD, and the result was correct.

postgres=# SELECT u.user_id, count(*)
FROM u
WHERE u.value_2 = ANY (SELECT e.value_2
FROM e
WHERE u.user_id > e.user_id)
GROUP BY u.user_id;
user_id | count
---------+-------
5 | 26
(1 row)

postgres=# explain SELECT u.user_id, count(*)
FROM u
WHERE u.value_2 = ANY (SELECT e.value_2
FROM e
WHERE u.user_id > e.user_id)
GROUP BY u.user_id;
QUERY PLAN
---------------------------------------------------------------------
HashAggregate (cost=4.76..4.77 rows=1 width=12)
Group Key: u.user_id
-> Hash Semi Join (cost=3.25..4.71 rows=9 width=4)
Hash Cond: (u.value_2 = e.value_2)
Join Filter: (u.user_id > e.user_id)
-> Seq Scan on u (cost=0.00..1.26 rows=26 width=8)
-> Hash (cost=2.00..2.00 rows=100 width=8)
-> Seq Scan on e (cost=0.00..2.00 rows=100 width=8)
(8 rows)

--
Thanks,
Tender Wang

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2026-06-11 01:48:30 Re: [PATCH] contrib/xml2: backend crash in xpath_nodeset() on the namespace axis
Previous Message Andrey Chernyy 2026-06-11 00:14:36 [PATCH] contrib/xml2: backend crash in xpath_nodeset() on the namespace axis