BUG #19517: Eager Aggregation produces wrong count(*) when pushed into RHS of Hash Semi Join

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: ihalatci(at)gmail(dot)com
Subject: BUG #19517: Eager Aggregation produces wrong count(*) when pushed into RHS of Hash Semi Join
Date: 2026-06-10 15:44:31
Message-ID: 19517-4c53dcdcef98d263@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 19517
Logged by: Ibrahim Halatci
Email address: ihalatci(at)gmail(dot)com
PostgreSQL version: 19beta1
Operating system: Linux x86_64
Description:

Description
-----------
The "Implement Eager Aggregation" optimization (commit by Richard Guo,
2025-10-08, master) can push a partial aggregate into the inner side of a
semi-join. In that position the partial count(*) counts inner-side rows
that match the semi-join condition, but the finalize aggregate above the
join then sums those PARTIAL counts as if they were counts of outer rows.
The query therefore returns inflated counts.

Disabling hash aggregation (SET enable_hashagg = off) is a reliable
workaround because it forces the planner away from the bad plan shape.

Minimal self-contained reproducer
---------------------------------
DROP TABLE IF EXISTS u, e;
CREATE TABLE u(user_id int, value_2 int);
CREATE TABLE e(user_id int, value_2 int);
INSERT INTO u SELECT 5, s % 4 FROM generate_series(1, 26) s;
INSERT INTO e SELECT s % 5, s % 4 FROM generate_series(1, 100) s;
ANALYZE u;
ANALYZE e;

-- Wrong result on PG19beta1:
SELECT u.user_id, count(*)
FROM u
WHERE u.value_2 = ANY (SELECT e.value_2
FROM e
WHERE u.user_id > e.user_id)
GROUP BY u.user_id;

-- Observed on PG19beta1: (5, 130)
-- Expected (and result on PG18, PG17, PG16): (5, 26)

-- Workaround restores the correct result:
SET enable_hashagg = off;
SELECT u.user_id, count(*)
FROM u
WHERE u.value_2 = ANY (SELECT e.value_2
FROM e
WHERE u.user_id > e.user_id)
GROUP BY u.user_id;
-- Returns: (5, 26)
RESET enable_hashagg;

EXPLAIN (VERBOSE, COSTS OFF) of the bad plan
--------------------------------------------
The diagnostic indicator is a "Hash Semi Join" whose Output list contains
"(PARTIAL count(*))", sourced from a "Partial HashAggregate" keyed on the
inner relation's columns. The "Finalize HashAggregate" above the join then
sums these inner-side partial counts instead of counting matched outer rows.

Likely cause
------------
Eager Aggregation appears to consider the RHS of a SEMI/ANTI join a legal
site to push a partial aggregate, but for SEMI joins each outer row matches
at most one inner row by the join's own semantics, so partial counts taken
on the inner side cannot be combined into a correct outer-side count(*).
Either eager aggregation must be disabled for the RHS of SEMI/ANTI joins,
or the partial state must be projected through the semi-join such that
each surviving outer row contributes exactly one partial.

Two follow-up commits to eager aggregation landed on 2026-04-06
("Fix volatile function evaluation in eager aggregation" and
"Fix collation handling for grouping keys in eager aggregation"), but
neither addresses semi-join correctness.

Search performed before reporting
---------------------------------
- pgsql-bugs archives for 2026-04, 2026-05, and 2026-06: no matching
report.
- pgsql-hackers search "eager aggregation semi": 0 hits.
- pgsql-hackers search "eager aggregation bug": only the original
development thread by Richard Guo.
- PostgreSQL 19 Open Items wiki (checked 2026-06-10): not listed under
Open Issues, Resolved before 19beta1/19beta2, Non-bugs, or Won't Fix.

Impact on downstream
--------------------
This was discovered while bringing up Citus on PG19; the regression
test multi_subquery_in_where_reference_clause exposes the same plan shape
through a distributed query whose worker-side SQL reduces to the pattern
above. We have applied SET enable_hashagg = off locally as a temporary
workaround.

Thank you,

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Matheus Alcantara 2026-06-10 20:19:09 Re: BUG #19517: Eager Aggregation produces wrong count(*) when pushed into RHS of Hash Semi Join
Previous Message Tom Lane 2026-06-10 14:38:56 Re: BUG #19515: Creating a faulty BRIN operator class can cause the server to crash when used.