| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: should we have a fast-path planning for OLTP starjoins? |
| Date: | 2025-11-28 23:34:07 |
| Message-ID: | aea18094-c4b9-4633-a5e3-4d776dbb34ec@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 11/28/25 23:50, Robert Haas wrote:
> On Fri, Nov 28, 2025 at 2:21 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:=
>> The patch identifies dimensions without restrictions, moves them aside,
>> and does regular join search on the rest of the relations (some of which
>> may be dimensions with restrictions).
>>
>> So the presence of dimensions with restrictions does not disable the
>> optimization entirely, it just means the dimensions with restrictions
>> won't benefit from it. So in the worst case, it'll perform just like
>> before (assuming the optimization is kept cheap enough).
>
> I'm guessing that the reason you're limiting this to relations without
> restrictions is so that you don't have to reason about the join
> causing cardinality changes. But I don't quite understand how you
> avoid having that problem anyway. For example, suppose that we have a
> fact table F which is joined to relations S1, S2, D1, D2, I1, and I2.
> The joins to S1 and S2 are joins on FKs to unconstrained dimension
> tables; i.e. cardinality stays the same. The joins to D1 and D2 are
> joins on FKs to constrained dimension tables, i.e. cardinality
> decreases. The joins to I1 and I2 on average match more than once,
> i.e. cardinality increases. The optimal join order is presumably
> something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing
> through each level of the join tree is kept as small as possible, but
> how do you achieve this if the joins to S1 and S2 are set aside? In
> the presence of row-count-inflating joins, it's wrong to suppose that
> S1 and S2 should be joined at the end.
>
Yes, that's certainly possible. It was discussed at the very beginning
of this thread, and at that point there was a suggestion to keep it
simple and just push the joins to the end:
https://www.postgresql.org/message-id/1751009.1738974197@sss.pgh.pa.us
I'm not claiming that's good enough, though. Maybe we should reconsider
and it needs a better solution.
> What seems more likely to be true is that we should perform the join
> to S1 and the join to S2 consecutively. It's very common in my
> experience for complex reporting queries to have a bunch of joins the
> results of which matter very little: each one changes the cardinality
> very little or not at all, and so any ordering of those joins produces
> essentially the same result, and probably it's best to do them all at
> whatever point in the join sequence the cardinality of the other input
> is at lowest. However, I'm not sure that even this is guaranteed. For
> instance, suppose that in the above example there's no index on the
> column of F that joins to S2. Could it be that the only reasonable way
> to join to S2 is a merge join? If so, it might be best to start by
> join F to S2 using an index scan on each, and then continue by joining
> to D1, D2, S1, I1, I2 in that order.
>
Good points.
The first part reminds me the approach I mentioned a couple messages
back, We might preprocess the join lists, but instead of "removing the
dimensions" from the search, we'd group them together, and treat each
group as a single item in join_search_one_level. Then, whenever
join_search_one_level hits the group, it'd expand it into a sequence of
joins. I have not tried that yet, but it seems doable ...
However, I'm not sure about your second point - what if the join order
matters after all, e.g. because some join order can leverage ordering
(pathkeys) of the inputs, or produces ordering better for the later joins?
> Every time I start to think about this kind of optimization, I find
> myself getting hung up on corner cases like these which are, maybe,
> not all that probable, but which I think people almost certainly do
> rely on the planner to get right. I don't know what to do about that.
> I certainly agree with the idea that we waste a lot of energy
> searching through functionally identical join orders and that we
> should find a way to avoid that, but I'm a little suspicious that
> avoiding regressions, or even finding out whether we've introduced
> any, will prove difficult.
>
True. I started working on this with two assumptions: (a) we can detect
cases when this is guaranteed to be the optimal join order, and (b) it
would be cheap to do so. Maybe one of those assumptions is not correct.
thanks
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2025-11-29 01:54:25 | Re: pgsql: Inline pg_ascii_tolower() and pg_ascii_toupper(). |
| Previous Message | Robert Haas | 2025-11-28 22:50:34 | Re: should we have a fast-path planning for OLTP starjoins? |