From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2025-06-26 02:01:35 |
Message-ID: | CAMbWs48F8WGA-Lzj1Dk76mFqRFxPEwG2_9Zb7+pFs8oi6ew2pw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 13, 2025 at 4:41 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> I've switched back to this thread and will begin by working through
> the key concerns that were previously raised.
>
> The first concern is the lack of a proof demonstrating the correctness
> of this transformation. To address this, I plan to include a detailed
> proof in the README, along the lines of the following.
> The second concern is that a RelOptInfo representing a grouped
> relation may include paths that produce different row sets due to
> partial aggregation being applied at different join levels. This
> potentially violates a fundamental assumption in the planner.
>
> Additionally, the patch currently performs an exhaustive search by
> exploring partial aggregation at every possible join level, leading to
> excessive planning effort, which may not be justified by the
> cost-benefit ratio.
>
> To address these concerns, I'm thinking that maybe we can adopt a
> strategy where partial aggregation is only pushed to the lowest
> possible level in the join tree that is deemed useful. In other
> words, if we can build a grouped path like "AGG(B) JOIN A" -- and
> AGG(B) yields a significant reduction in row count -- we skip
> exploring alternatives like "AGG(A JOIN B)".
Here is the patch based on the proposed ideas. It includes the proof
of correctness in the README and implements the strategy of pushing
partial aggregation only to the lowest applicable join level where it
is deemed useful. This is done by introducing a "Relids apply_at"
field to track that level and ensuring that partial aggregation is
applied only at the recorded "apply_at" level.
Additionally, this patch changes how grouped relations are stored.
Since each grouped relation represents a partially aggregated version
of a non-grouped relation, we now associate each grouped relation with
the RelOptInfo of the corresponding non-grouped relation. This
eliminates the need for a dedicated list of all grouped relations and
avoids list searches when retrieving a grouped relation.
It also addresses other previously raised concerns, such as the
potential memory blowout risks with large partial-aggregation values,
and includes improvements to comments and the commit message.
Another change is that this feature is now enabled by default.
Thanks
Richard
Attachment | Content-Type | Size |
---|---|---|
v17-0001-Implement-Eager-Aggregation.patch | application/octet-stream | 165.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2025-06-26 02:20:05 | RE: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages |
Previous Message | torikoshia | 2025-06-26 01:43:52 | Re: speedup COPY TO for partitioned table. |