Quick Links

Re: should we have a fast-path planning for OLTP starjoins?

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-11-15 00:41:04
Message-ID:	8643a223-dc84-4366-ae2e-7aee76e11d3e@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11/14/25 22:07, Bruce Momjian wrote:
> On Wed, Nov 12, 2025 at 04:39:50PM +0100, Tomas Vondra wrote:
>> Hi,
>>
>> Here's a v5, addressing some (but not all) of the things discussed
>> earlier in this thread.
>>
>> This does nothing about the "opposite" type of join, with many small
>> tables linking to a single "central" table. I'm not convinced it makes
>> sense to handle that together with starjoins. But I haven't tried yet.
>
> I read this thread and the patch. I have a few questions which might
> have already been answered but they used terminology I might not have
> understood. I want to explain what I think is happening and perhaps
> someone can tell me if these ideas are new or are already covered.
>
> So, assume a fact table with a primary-key first column, and ten more
> columns, each with its own dimension table.
>
> So, a star join would query the fact table with some filter, and then
> join each of the ten columns to its own dimension table, e.g.,
>
> fact.col2 joins to dim2.primary_key
> fact.col3 joins to dim3.primary_key
> ...
>
> and the problem is that the dimension tables don't join to each other,
> but only to the fact table, and our existing optimizer considers join
> orders of:
>
> F to D2
> F to D3
> ...
>
> and then
>
> F to D3
> F to D4
> ...
>
> and there are 10! possible combinations, and Tomas is saying that the
> dimension tables are almost all the same in their affect on the row
> count, so why bother to consider all 10! join orders. Also, some joins
> might increase the row count, so a foreign key guarantees only one row
> will be matched in the dimension.
>

Right. A join between F and a "dimension" has the same cardinality as F,
i.e. each row in F has one matching row in the dimension. It's entirely
irrelevant in which order we actually join the dimensions.

And then there may be some other joins that affect the cardinality, and
the question is what to do about these ...

> I found this code comment in the recent patch which is helpful:
>
> + * The query may involve joins to additional (non-dimension) tables, and
> + * those may alter cardinality in either direction. In principle, it'd be
> + * best to first perform all the joins that reduce join size, then join all
> + * the dimensions, and finally perform joins that may increase the join
> + * size. Imagine a joinlist:
> + *
> + * (D1, D2, A, B, F)
> + *
> + * with fact F, dimensions D1 and D2, and non-dimensions A and B. If A
> + * increases cardinality, and B does not (or even reduces it), we could
> + * use this join tree:
> + *
> + * (A, (D2, (D1, (B, F))))
> + *
> + * For now we simply leave the dimension joins at the end, assuming
> + * that the earlier joins did not inflate the join too much.
>
> And then there is the problem of detecting when this happens.
>
> I think my big question is, when we join A->B->C->D, we do a lot of work
> in the optimizer to figure out what order to use, but when we do A->B,
> A->C, A->D, why are we spending the same amount of optimizer effort?
>

I'm sorry, I don't quite understand what's the question here. What does
A->B->C->D mean, exactly?

The standard join algorithm is (intentionally) generic, it handles all
kinds of joins, and it simply doesn't have any special cases for joins
with a particular structure (like a starjoin).

> Could we just order B, C, D in order of which have restrictions, or
> based on size? I assume we can't because we don't know if A->C or
> another join would increase the number of rows, and this patch is saying
> if there is a foreign key relationship, they can't increase the rows so
> just short-circuit and order them simply. Is that accurate?
>
> Crazy question, if we have A->B, A->C, and A->D, why can't we just sort
> B,C,D based in increasing order of adding rows to the result, and just
> use that ordering, without requiring foreign keys?
>

Yeah, that's mostly the idea one of the comments in the patch suggests.
To do the joins that reduce the cardinality first, then the dimensions,
and finally the joins that increase the cardinality.

However, the examples make it look like we're joining pairs of tables,
but that's not necessarily true. The join may be between F and relation
that is really a join itself. And now you need to know how this join
changes the cardinality, which is more expensive than when only looking
at joins of pairs of tables.

So I think we'd need to first identify these "independent join groups"
first. But that seems non-trivial, because we don't know which table to
start from (we don't know what's the "F" table).

I'm sure there's an algorithm to decompose the join tree like this, but
I'm not sure how expensive / invasive it'd be. The premise of this patch
is that it's a cheap optimization, that doesn't need to do this.

It simply "peels off" the dimensions from the join, based on things it
can prove locally, without having to decompose the whole jointree.

In any case, this is my current understanding of the problem. It's
entirely possible I'm missing some obvious solution, described in a
paper somewhere.

regards

--
Tomas Vondra

In response to

Re: should we have a fast-path planning for OLTP starjoins? at 2025-11-14 21:07:36 from Bruce Momjian

Responses

Re: should we have a fast-path planning for OLTP starjoins? at 2025-11-15 02:52:40 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2025-11-15 02:47:11	Re: tuple radix sort
Previous Message	Sami Imseih	2025-11-15 00:25:36	Re: Report oldest xmin source when autovacuum cannot remove tuples