Re: should we have a fast-path planning for OLTP starjoins?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2025-11-23 14:39:31
Message-ID: a5a44ba8-4812-4614-9f69-5965c2666b81@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/21/25 21:14, Tom Lane wrote:
> I spent a little time staring at the v5 patches. Obviously there
> are a bunch of minor details to be verified, which you've carefully
> provided XXX comments about, and I didn't really go through those
> yet. There are two big-picture questions that are bothering me:
>
> 1. I do not think I believe the premise that the dimension tables
> typically won't have restriction clauses. ISTM that a typical
> query might be like
>
> select sum(o.total_price) from
> orders o
> join customers c on c.id = o.c_id
> join products p on p.id = o.p_id
> where c.customer_name = 'Wile E Coyote'
> and p.product_name = 'Rocket Skates';
>
> The only reason to join a dimension table that lacks a restriction
> clause is if you need some of its fields in the output, which you
> might but I'm not sure that's such a common case. (Have you got
> evidence to the contrary?) So I feel like we're not going to be
> getting all that much win if we are not willing to treat such tables
> as dimension tables. We could do something simplistic like order
> those dimensions by the selectivity of their baserestrict clauses,
> joining the most-restricted ones first and any restriction-free ones
> last.
>

Good question. I don't have a great evidence such joins to dimensions
(without additional restrictions) are a common case. It's partially a
guess and partially based on my past experience.

I have seen a lot of such joins in analytical workloads, where the join
is followed by an aggregation, with GROUP BY referencing attributes from
the dimensions. Of course, that may be an argument against worrying
about the planning too much, because with enough data the timing is
going to be dominated by the join/aggregation execution. However, it's
surprising how little data many analytical workloads actually access, so
it's not that clear.

The other use case I've been thinking about is poorly written queries,
joining more tables than needed. A traditional example is an ORM loading
more data than needed, to load the whole "object". I don't know how
prevalent this is today - it used to be a pretty common issue, and I
doubt it improved. I think it's not that different from the self-join
removal (the tradeoffs may be different, of course). I realize we try
not to add complexity for such cases, especially if it might hurt well
written queries.

Actually, I initially investigated at the opposite example, i.e. all
dimensions joining to the fact.id, see create-2/select-2 scripts. And
then I realized starjoins have mostly the same issue. But it's true the
v5 patch does not actually help this original query.

> 2. I'm pretty un-excited about the 0002 patch altogether. I'm having
> a hard time visualizing cases where it helps, other than left joins
> to dimension tables which I don't really think are common either.
> I did a bit of poking around on the net and found that it seems to
> be common to restrict star-join optimizations to equijoins (e.g.
> SAP says explicitly that they only handle that case). I think we'd
> be better off to focus on the allow-baserestrict-clauses extension
> than the allow-join-order-restrictions extension.
>

I recall seen such queries (with LEFT joins) in analytics workloads, but
it's definitely less common than inner starjoins. So I agree focusing on
allowing baserestrict clauses is probably more useful/important.

FWIW I tried searching for more info too, but all the SAP pages
suggested by google return 404 to me :-(

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2025-11-23 14:45:35 [PATCH] Avoid pallocs in async.c's SignalBackends critical section
Previous Message Fujii Masao 2025-11-23 14:05:50 Re: Add MERGE and VALUES statements to tab completion for PREPARE