From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Paul George <p(dot)a(dot)george19(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Eager aggregation, take 3 |
Date: | 2025-09-09 14:30:01 |
Message-ID: | CA+TgmobT156YuP5HUscbR_17nQs=KC703T=SB3VkQMcFbK4Gqw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 9, 2025 at 6:30 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > I think it would be worth considering generating the partially grouped
> > relations in a second pass. Right now, as you progress from the bottom
> > of the join tree towards the top, you created grouped rels as you go.
> > But you could equally well finish planning everything up to the
> > scan/join target first and then go back and add grouped_rels to
> > relations where it seems worthwhile.
>
> Hmm, I don't think so. I think the presence of eager aggregation
> could change the best join order. For example, without eager
> aggregation, the optimizer might find that (A JOIN B) JOIN C the best
> join order. But with eager aggregation on B, the optimizer could
> prefer A JOIN (AGG(B) JOIN C). I'm not sure how we could find the
> best join order with eager aggregation applied without building the
> join tree from the bottom up.
Oh, that is a problem, yes. :-(
> > I haven't done a detailed comparison of generate_grouped_paths() to
> > other parts of the code, but I have an uncomfortable feeling that it
> > might be rather similar to some existing code that probably already
> > exists in multiple, slightly-different versions. Is there any
> > refactoring we could do here?
>
> Yeah, we currently have several functions that do similar, but not
> exactly the same, things. Maybe some refactoring is possible -- maybe
> not -- I haven't looked into it closely yet. However, I'd prefer to
> address that in a separate patch if possible, since this issue also
> exists on master, and I want to avoid introducing such changes in this
> already large patch.
Well, it's not just a matter of "this already exists" -- it gets
harder and harder to unify things the more near-copies you add.
> Good point. I do have manually tested GEQO by setting geqo_threshold
> to 2 and running the regression tests to check for any planning
> errors, crashes, or incorrect results. However, I'm not sure where
> test cases for GEQO should be added. I searched the regression tests
> and found only one explicit GEQO test, added back in 2009 (commit
> a43b190e3). It's not quite clear to me what the current policy is for
> adding GEQO test cases.
>
> Anyway, I will add some test cases in eager_aggregate.sql with
> geqo_threshold set to 2.
Sounds good. I think GEQO is mostly-unmaintained these days, but if
we're updating the code, I think it is good to add tests. Being that
the code is so old, it probably lacks adequate test coverage.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2025-09-09 14:37:51 | Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization |
Previous Message | Peter Eisentraut | 2025-09-09 14:24:02 | stray references to SubscriptRef type |