Re: Eager aggregation, take 3

From: Radim Marek <radim(at)boringsql(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Eager aggregation, take 3
Date: 2026-05-29 15:55:01
Message-ID: CAJgoLk+d_P5sKrx-SZt01Acm_j0QnWn6aKJzFJ=waRu_3C8AoQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey Richard,

I might be out of my depth here - but while testing RegreSQL as
correctness/performance harness on PostgreSQL it picked up a problem with
the wrong-results case during eager aggregation.

It reproduces on current HEAD
(commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now)
with enable_eager_aggregate enabled.

My testing environment
- Linux aarch64, gcc 12 (Debian)
- macOS arm64, Apple clang 21
(PostgreSQL 19devel on aarch64-apple-darwin25.5.0)

== How to reproduce

CREATE TEMP TABLE c(id int, country text);
CREATE TEMP TABLE o(customer_id int);
INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
INSERT INTO o VALUES (1),(3); -- only customers 1 and 3 have a row in o

SELECT c.country, count(*) AS n
FROM c
WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;

Expected results (everywhere except master)

country | n
---------+---
DE | 2
US | 1
(2 rows)

The actual result with enable_eager_aggregate = on (default)

country | n
---------+---
DE | 0
US | 0
(2 rows)

With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1),
as it is on PG18.

Query Plan

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197
rows=2.00 loops=1)
Sort Key: c.country
Sort Method: quicksort Memory: 25kB
Buffers: local hit=2
-> Finalize HashAggregate (cost=98.55..100.55 rows=200 width=40)
(actual time=0.183..0.186 rows=2.00 loops=1)
Group Key: c.country
Batches: 1 Memory Usage: 32kB
Buffers: local hit=2
-> Hash Anti Join (cost=52.75..95.37 rows=635 width=40) (actual
time=0.177..0.179 rows=3.00 loops=1)
Hash Cond: (c.id = o.customer_id)
Buffers: local hit=2
-> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36)
(actual time=0.024..0.025 rows=5.00 loops=1)
Buffers: local hit=1
-> Hash (cost=50.25..50.25 rows=200 width=12) (actual
time=0.145..0.146 rows=2.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: local hit=1
-> Partial HashAggregate (cost=48.25..50.25 rows=200
width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
Group Key: o.customer_id
Batches: 1 Memory Usage: 32kB
Buffers: local hit=1
-> Seq Scan on o (cost=0.00..35.50 rows=2550
width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
Buffers: local hit=1
Planning Time: 0.294 ms
Execution Time: 0.255 ms
(24 rows)

If this is already known or in progress, apologies for the noise.

---

Radim

On Fri, 29 May 2026 at 17:25, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> Hi All,
>
> Eager aggregation is a query optimization technique that partially
> pushes a group-by past a join, and finalizes it once all the relations
> are joined. Eager aggregation reduces the number of input rows to the
> join and thus may result in a better overall plan. This technique is
> thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
> paper [1].
>
> Back in 2017, a patch set has been proposed by Antonin Houska to
> implement eager aggregation in thread [2]. However, it was at last
> withdrawn after entering the pattern of "please rebase thx" followed by
> rebasing and getting no feedback until "please rebase again thx". A
> second attempt in 2022 unfortunately fell into the same pattern about
> one year ago and was eventually closed again [3].
>
> That patch set has included most of the necessary concepts to implement
> eager aggregation. However, as far as I can see, it has several weak
> points that we need to address. It introduces invasive changes to some
> core planner functions, such as make_join_rel(). And with such changes
> join_is_legal() would be performed three times for the same proposed
> join, which is not great. Another weak point is that the complexity of
> join searching dramatically increases with the growing number of
> relations to be joined. This occurs because when we generate partially
> aggregated paths, each path of the input relation is considered as an
> input path for the grouped paths. As a result, the number of grouped
> paths we generate increases exponentially, leading to a significant
> explosion in computational complexity. Other weak points include the
> lack of support for outer joins and partitionwise joins. And during my
> review of the code, I came across several bugs (planning error or crash)
> that need to be addressed.
>
> I'd like to give it another take to implement eager aggregation, while
> borrowing lots of concepts and many chunks of codes from the previous
> patch set. Please see attached. I have chosen to use the term 'Eager
> Aggregation' from the paper [1] instead of 'Aggregation push-down', to
> differentiate the aggregation push-down technique in FDW.
>
> The patch has been split into small pieces to make the review easier.
>
> 0001 introduces the RelInfoList structure, which encapsulates both a
> list and a hash table, so that we can leverage the hash table for faster
> lookups not only for join relations but also for upper relations. With
> eager aggregation, it is possible that we generate so many upper rels of
> type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
> lookups.
>
> 0002 introduces the RelAggInfo structure to store information needed to
> create grouped paths for base and join rels. It also revises the
> RelInfoList related structures and functions so that they can be used
> with RelAggInfos.
>
> 0003 checks if eager aggregation is applicable, and if so, collects
> suitable aggregate expressions and grouping expressions in the query,
> and records them in root->agg_clause_list and root->group_expr_list
> respectively.
>
> 0004 implements the functions that check if eager aggregation is
> applicable for a given relation, and if so, create RelAggInfo structure
> for the relation, using the infos about aggregate expressions and
> grouping expressions we collected earlier. In this patch, when we check
> if a target expression can act as grouping expression, we need to check
> if this expression can be known equal to other expressions due to ECs
> that can act as grouping expressions. This patch leverages function
> exprs_known_equal() to achieve that, after enhancing this function to
> consider opfamily if provided.
>
> 0005 implements the functions that generate paths for grouped relations
> by adding sorted and hashed partial aggregation paths on top of paths of
> the plain base or join relations. For sorted partial aggregation paths,
> we only consider any suitably-sorted input paths as well as sorting the
> cheapest-total path. For hashed partial aggregation paths, we only
> consider the cheapest-total path as input. By not considering other
> paths we can reduce the number of grouping paths as much as possible
> while still achieving reasonable results.
>
> 0006 builds grouped relations for each base relation if possible, and
> generates aggregation paths for the grouped base relations.
>
> 0007 builds grouped relations for each just-processed join relation if
> possible, and generates aggregation paths for the grouped join
> relations. The changes made to make_join_rel() are relatively minor,
> with the addition of a new function make_grouped_join_rel(), which finds
> or creates a grouped relation for the just-processed joinrel, and
> generates grouped paths by joining a grouped input relation with a
> non-grouped input relation.
>
> The other way to generate grouped paths is by adding sorted and hashed
> partial aggregation paths on top of paths of the joinrel. This occurs
> in standard_join_search(), after we've run set_cheapest() for the
> joinrel. The reason for performing this step after set_cheapest() is
> that we need to know the joinrel's cheapest paths (see 0005).
>
> This patch also makes the grouped relation for the topmost join rel act
> as the upper rel representing the result of partial aggregation, so that
> we can add the final aggregation on top of that. Additionally, this
> patch extends the functionality of eager aggregation to work with
> partitionwise join and geqo.
>
> This patch also makes eager aggregation work with outer joins. With
> outer join, the aggregate cannot be pushed down if any column referenced
> by grouping expressions or aggregate functions is nullable by an outer
> join above the relation to which we want to apply the partiall
> aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
> can easily identify such situations and subsequently refrain from
> pushing down the aggregates.
>
> Starting from this patch, you should be able to see plans with eager
> aggregation.
>
> 0008 adds test cases for eager aggregation.
>
> 0009 adds a section in README that describes this feature (copied from
> previous patch set, with minor tweaks).
>
> Thoughts and comments are welcome.
>
> [1] https://www.vldb.org/conf/1995/P345.PDF
> [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
> [3]
> https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jpnprd01.prod.outlook.com
>
> Thanks
> Richard
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-05-29 15:56:19 Re: Heads Up: cirrus-ci is shutting down June 1st
Previous Message Fujii Masao 2026-05-29 15:54:32 Re: some utf8 breaking substring(txt,1,3) but not substring(txt from '^.{4}')