Quick Links

Re: Eager aggregation, take 3

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Eager aggregation, take 3
Date:	2025-09-09 14:30:01
Message-ID:	CA+TgmobT156YuP5HUscbR_17nQs=KC703T=SB3VkQMcFbK4Gqw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Sep 9, 2025 at 6:30 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > I think it would be worth considering generating the partially grouped
> > relations in a second pass. Right now, as you progress from the bottom
> > of the join tree towards the top, you created grouped rels as you go.
> > But you could equally well finish planning everything up to the
> > scan/join target first and then go back and add grouped_rels to
> > relations where it seems worthwhile.
>
> Hmm, I don't think so. I think the presence of eager aggregation
> could change the best join order. For example, without eager
> aggregation, the optimizer might find that (A JOIN B) JOIN C the best
> join order. But with eager aggregation on B, the optimizer could
> prefer A JOIN (AGG(B) JOIN C). I'm not sure how we could find the
> best join order with eager aggregation applied without building the
> join tree from the bottom up.

Oh, that is a problem, yes. :-(

> > I haven't done a detailed comparison of generate_grouped_paths() to
> > other parts of the code, but I have an uncomfortable feeling that it
> > might be rather similar to some existing code that probably already
> > exists in multiple, slightly-different versions. Is there any
> > refactoring we could do here?
>
> Yeah, we currently have several functions that do similar, but not
> exactly the same, things. Maybe some refactoring is possible -- maybe
> not -- I haven't looked into it closely yet. However, I'd prefer to
> address that in a separate patch if possible, since this issue also
> exists on master, and I want to avoid introducing such changes in this
> already large patch.

Well, it's not just a matter of "this already exists" -- it gets
harder and harder to unify things the more near-copies you add.

> Good point. I do have manually tested GEQO by setting geqo_threshold
> to 2 and running the regression tests to check for any planning
> errors, crashes, or incorrect results. However, I'm not sure where
> test cases for GEQO should be added. I searched the regression tests
> and found only one explicit GEQO test, added back in 2009 (commit
> a43b190e3). It's not quite clear to me what the current policy is for
> adding GEQO test cases.
>
> Anyway, I will add some test cases in eager_aggregate.sql with
> geqo_threshold set to 2.

Sounds good. I think GEQO is mostly-unmaintained these days, but if
we're updating the code, I think it is good to add tests. Being that
the code is so old, it probably lacks adequate test coverage.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Eager aggregation, take 3 at 2025-09-09 10:30:04 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aleksander Alekseev	2025-09-09 14:37:51	Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization
Previous Message	Peter Eisentraut	2025-09-09 14:24:02	stray references to SubscriptRef type