| From: | David Rowley <dgrowleyml(at)gmail(dot)com> | 
|---|---|
| To: | Carlo Sganzerla <carlo(at)alude(dot)com(dot)br> | 
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-performance(at)postgresql(dot)org, Rafael Almeida <rafael(at)alude(dot)com(dot)br>, Leandro Noman <leandro(at)alude(dot)com(dot)br> | 
| Subject: | Re: GEQO plans much slower than standard join plans | 
| Date: | 2025-10-31 01:19:03 | 
| Message-ID: | CAApHDvq2FR2HfXu7Y7h-cfvvJU40e8NX8w=+1TBkF4+wW5H-eA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On Thu, 30 Oct 2025 at 03:57, Carlo Sganzerla <carlo(at)alude(dot)com(dot)br> wrote:
> Yes, I also found that counterintuitive. By turning geqo = off (join/from_collapse_limit were 14) and looking at some metrics we obtained from pg_stat_statements, we found that planning times of the affected queries (the ones which would make use of GEQO) mostly increased (but not by much), which is already the documented behavior. However, one particular query with the same structure like the one on treejoin-14-dims.sql (attached to the first email) had a _decrease_ on planning and execution times, so geqo = off for this query yielded faster plans than geqo = on. This counterintuitive behavior is the reason that I created and shared the script that tried to reproduce it. I'm not sure whether you've run it or not, but on every machine I have run so far I had a 10 fold increase on TPS and a 10 fold decrease on planning times if I turn geqo = off.
>
> I guess what I'm trying to show is that, in very particular cases, GEQO may harm planning times, that's why I felt a little disclaimer could be useful. I'm not sure what else I can add here.
I think what's happening here is that in the query as you've written
it, there isn't much flexibility in the how the tables are joined.
There are 14 EquivalanceClasses in the 14 table case. "f" cannot be
joined to "c13" without "c12", and joining "c12" requires "c11" and so
on.  For queries that join all on the same column, there'd be fewer
EquivalanceClasses and more flexibility in the join order.
Doing:
\set id random(1,10000)
set join_collapse_limit = 14;
set geqo = off;
explain (summary on) select * from f
    join c0 on (c0.id = f.id)
    join c1 on (c1.id = c0.id)
    join c2 on (c1.id = c0.id)
    join c3 on (c3.id = c2.id)
    join c4 on (c4.id = c3.id)
    join c5 on (c5.id = c4.id)
    join c6 on (c6.id = c5.id)
    join c7 on (c7.id = c6.id)
    join c8 on (c8.id = c7.id)
    join c9 on (c9.id = c8.id)
    join c10 on (c10.id = c9.id)
    join c11 on (c11.id = c10.id)
    join c12 on (c12.id = c11.id)
    join c13 on (c13.id = c12.id)
where f.id = :id;
Is much slower to plan as it opens up the possibilities for join order
as there's a single EquivalanceClass with 14 members and any
combination is possible.
I've not studied the GEQO code nearly as much as the standard join
search code, but what it seems to do is try random combinations. For
the case you demonstrated in [1], it discovers a large number of pairs
that cannot be joined as there's no join condition between them. I
suspect that's the issue here. The standard search manages to just
incrementally build up towards the final join incrementally as it's
processing the tables in the same order as they appear in the query
and it never hits the case of not finding a join condition between the
pairs it tries.
Maybe there's some way to make GEQO more efficient by precalculating
which pairs might be good ones to try. That likely could be done
fairly efficiently now that we have RelOptInfo.eclass_indexes, or at
least it would help identify ones that there's no point in trying.
Maybe someone wants to study this more than I have and see what might
be possible. I've not quite got the motivation, at least not today.
David
[1] https://www.postgresql.org/message-id/attachment/183634/treejoin-14-dims.sql
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Pavel Stehule | 2025-10-30 16:27:12 | Re: proposal: schema variables |