Re: should we have a fast-path planning for OLTP starjoins?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2025-12-27 15:20:21
Message-ID: 4cef9073-1887-46a6-938e-a7b767351dee@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/24/25 19:16, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Dec 20, 2025 at 5:49 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> It seems we have to identify these joins _before_ we actually start the
>>> main optimizer. We can identify restriction joins since we see the
>>> restriction in the query, and we can identify neutral joins because of
>>> foreign keys. How do we identify expansion joins? Is it all the joins
>>> which are not the previous types?
>
>> We unfortunately have no way to identify these joins before we
>> actually start the main optimizer; that's not how the code works. I'm
>> not sure if there's a reasonable way to do better, because whether the
>> join inflates or reduces the row count can't be known independently of
>> the join order in general, even though in practice it often can.
>
> I think the core idea of this proposal is to improve cases where
> we *can* know that; where we can't, just do what we've always done.
> Not handling every case isn't a fatal objection, as long as the
> patch doesn't spend much time to discover that it can't help.
>

Right. This was conceived as a cheap opportunistic optimization. It has
to be cheap enough to not hurt queries that don't benefit it (which may
be most queries).

> Having said that, I'm starting to wonder whether "do this stuff in a
> separate pass before the main optimizer" is the wrong structural
> decision. Should we be injecting the logic at some later point
> where we've gathered more information? At least in principle,
> we should be able to build all base-relation Paths before we start
> to think about join order; would having those help?
>

It's entirely possible doing it in a separate pass before the "regular"
join order search is the wrong approach. But if wanted to do that later,
when would that be?

I don't think doing that after building baserels would be enough. What
additional information would that give us? AFAICS we already have
RelOptInfos for baserels with all the necessary info about foreign keys,
join clauses etc. The thing we don't have is RelOptInfos for the joins,
which means we can't use join_is_legal() etc.

An idea I floated some time back is "grouping" the dimension joins into
a single "group entry", and teach join_search_one_level() to expand it
into a sequence of joins whenever it hits it. It'd still consider all
join orders of the "other" joins (some of which may increase or decrease
the join cardinality).

Would this be a better structure? It'd be later than what we do now, and
we can't really do it any later, I think.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-12-27 17:00:10 Re: Add a greedy join search algorithm to handle large join problems
Previous Message Tomas Vondra 2025-12-27 14:52:21 Re: Add the ability to limit the amount of memory that can be allocated to backends.