Re: apply_scanjoin_target_to_paths and partitionwise join

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Arne Roland <arne(dot)roland(at)malkut(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join
Date: 2025-10-31 06:02:44
Message-ID: CAMbWs4-daM-1pid3+EWovgX3L4u3taH7d0CFELnS46o_7Dy1DQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 31, 2025 at 4:57 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> And just like that, I found another way that this can happen. Consider
> this query from the regression tests:
>
> SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a
> AND t1.a = t2.b ORDER BY t1.a, t2.b;

I observed something interesting about this query. If you swap the
two join conditions, you should theoretically get a semantically
equivalent query.

SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b
AND t1.a = t2.a ORDER BY t1.a, t2.b;

However, unlike the original query, the planner estimates that a
partitionwise join is cheaper than a non-partitionwise join for this
version, with costs of 12.74 and 14.24, respectively.

What's more surprising is that the non-partitionwise plans for these
two queries differ significantly in cost. The original query has an
estimated cost of 7.86, while the modified version's cost is 14.24.
This also indicates that the discrepancy is unrelated to partitionwise
join.

I looked into this a bit and traced it to mergejoinscansel(). This
function estimates the value ranges of both inputs to determine how
much of the input will actually be read, since a merge join can stop
as soon as either input is exhausted. For the original query, the
merge clause is "t1.a = t2.a", and the function estimates the maximum
value of the right-side variable (t2.a) as 24. For the modified
query, the merge clause becomes "t1.a = t2.b", and the estimated
maximum value of the right-side variable (t2.b) is 597.

This isn't actually incorrect given how get_variable_range() works:

select max(a), max(b) from prt2;
max | max
-----+-----
24 | 597
(1 row)

However, the logic overlooks the fact that t2.a is always constrained
to be equal to t2.b, meaning their value ranges should be identical.

I think we may need to do something here. However, it's a bit
off-topic for this thread. I'm just noting it here in case anyone
else is interested.

- Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinya Kato 2025-10-31 06:31:40 Report oldest xmin source when autovacuum cannot remove tuples
Previous Message Hayato Kuroda (Fujitsu) 2025-10-31 05:59:59 RE: How can end users know the cause of LR slot sync delays?