| From: | Richard Guo <guofenglinux(at)gmail(dot)com> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Arne Roland <arne(dot)roland(at)malkut(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> | 
| Subject: | Re: apply_scanjoin_target_to_paths and partitionwise join | 
| Date: | 2025-10-31 06:02:44 | 
| Message-ID: | CAMbWs4-daM-1pid3+EWovgX3L4u3taH7d0CFELnS46o_7Dy1DQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Fri, Oct 31, 2025 at 4:57 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> And just like that, I found another way that this can happen. Consider
> this query from the regression tests:
>
> SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a
> AND t1.a = t2.b ORDER BY t1.a, t2.b;
I observed something interesting about this query.  If you swap the
two join conditions, you should theoretically get a semantically
equivalent query.
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b
AND t1.a = t2.a ORDER BY t1.a, t2.b;
However, unlike the original query, the planner estimates that a
partitionwise join is cheaper than a non-partitionwise join for this
version, with costs of 12.74 and 14.24, respectively.
What's more surprising is that the non-partitionwise plans for these
two queries differ significantly in cost.  The original query has an
estimated cost of 7.86, while the modified version's cost is 14.24.
This also indicates that the discrepancy is unrelated to partitionwise
join.
I looked into this a bit and traced it to mergejoinscansel().  This
function estimates the value ranges of both inputs to determine how
much of the input will actually be read, since a merge join can stop
as soon as either input is exhausted.  For the original query, the
merge clause is "t1.a = t2.a", and the function estimates the maximum
value of the right-side variable (t2.a) as 24.  For the modified
query, the merge clause becomes "t1.a = t2.b", and the estimated
maximum value of the right-side variable (t2.b) is 597.
This isn't actually incorrect given how get_variable_range() works:
select max(a), max(b) from prt2;
 max | max
-----+-----
  24 | 597
(1 row)
However, the logic overlooks the fact that t2.a is always constrained
to be equal to t2.b, meaning their value ranges should be identical.
I think we may need to do something here.  However, it's a bit
off-topic for this thread.  I'm just noting it here in case anyone
else is interested.
- Richard
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shinya Kato | 2025-10-31 06:31:40 | Report oldest xmin source when autovacuum cannot remove tuples | 
| Previous Message | Hayato Kuroda (Fujitsu) | 2025-10-31 05:59:59 | RE: How can end users know the cause of LR slot sync delays? |