Re: should we have a fast-path planning for OLTP starjoins?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2026-06-01 14:00:27
Message-ID: CA+Tgmobc6TbNwhk6_2Kxh+nQkyL2k8Qi-msMR4dWC32vsk=BTw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 28, 2026 at 6:11 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Determine if the query includes a starjoin (or multiple), and remember
> the relids included in the starjoin cluster. Pick a "canonical" join
> order for each starjoin cluster we found (e.g. with dimensions in relid
> order).

I like the idea of trying to avoid considering a bunch of join orders
that are not meaningfully different from each other, but I think we're
only really guaranteed that two join orders are equivalent if both of
them are exactly one-to-one, with the row count neither increasing nor
decreasing. (Even then, it's not 100% equivalent, but probably close
enough.) I don't remember off-hand whether your definition of starjoin
also includes joins that can reduce the row count. If it does, then I
think we might need to be smarter than just putting the dimensions in
relid order. If it does not, then maybe we need to think about whether
the optimization is going to apply to a sufficiently broad range of
cases. Maybe it's fine: if a fact table is joined to lots of dimension
tables, then some of them might have restriction clauses but probably
many of them won't.

This is making me wonder whether we could improve on GEQO by
transforming it from a "plan this query" thing to a "restrict the join
order" thing. For instance, suppose we have two tables T1 and T2 and
we wonder whether swapping the order in which they are joined matters.
We pick a few random partial join orders of tables in the query
excluding T1 and T2 and then check whether appending T1 and then T2
produces significantly different results from appending T2 and then
T1. So for example say we pick (), (T3), (T3-T4), and (T5-T3). We then
test the cost/row count of T1-T2 vs. T2-T1, the cost/row count of
T3-T1-T2 vs. T3-T2-T1, similarly for T3-T4-T1-T2 vs. T3-T4-T2-T1,
similarly for T5-T3-T1-T2 vs. T5-T3-T2-T1. If we see that swapping the
order of T1 and T2 routinely has minimal effect on the outcome, then
it's probably safe to pick one or the other to be joined first and
disregard all the join orders where the two are swapped. By repeating
this kind of testing for various pairs of tables, we could possibly
prune the join search space for large join problems considerably, and
then still do an exhaustive test of the remaining join orders. Maybe
that would produce better results than GEQO currently does (or maybe
not).

Although I kind of like this idea and think it's interesting, it's
probably not going to be a good enough idea to compete with what
you're doing, because your idea is working even when the number of
dimensions is very small, and this seems like it would be a
significantly more expensive test. So the question is probably whether
there are cases where your cheaper test loses too much. I'm not sure
of the answer. My intuition is that join planning is pretty hard and
that shortcuts will tend to have cases where they behave really badly,
but my intuition is also that we are wasting a whole lot of CPU cycles
testing cases that are nearly equivalent to each other. I'm not sure
which of those two intuitions is more correct in this case.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-06-01 14:41:35 Re: Heads Up: cirrus-ci is shutting down June 1st
Previous Message Nisha Moond 2026-06-01 13:44:12 Fix column privileges for pg_subscription.subwalrcvtimeout