Re: apply_scanjoin_target_to_paths and partitionwise join

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, arne(dot)roland(at)malkut(dot)net, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join
Date: 2025-10-28 20:17:08
Message-ID: CA+TgmoYN0kWfhmGZeN-t=tFJY3tbDhfS6AUz5Jtt4UBtFCjKVg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 27, 2025 at 5:12 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I haven't had a chance just yet to think through all the details of
> the proposed patch, but I now believe we should commit something along
> those lines. I still suspect that back-patching is unwise; even though
> I now agree with Ashutosh's claim that this is a bug, because previous
> experience with destabilizing plans in back-branches has not been
> good. Hence, I'm inclined to fix only master. I do think the comments
> in the patch need some work, and I plan to tackle that tomorrow.

It seems that, in the time sense this patch was originally posted,
it's been side-swiped by Richard Guo's commits 24225ad9aafc and
9b282a9359a1, with the result that the regression tests now fail with
the patch applied, and I'm not immediately certain how to clean that
up. I'm also not sure that the way the patch handles the test cases it
did adjust is optimal. Here is some preliminary analysis; opinions
appreciated.

With the patch as last posted applied, I see three regression test
failures. The first one is for this query:

explain (verbose, costs off)
select * from unique_tbl_p t1, unique_tbl_p t2
where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
order by t1.a, t2.a;

The second is for this query:

EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a
AND t1.a = t2.b ORDER BY t1.a, t2.b;

And the third one is for this query:

EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1
WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0))
AND t1.b = 0 ORDER BY t1.a;

Without the patch, all of the queries perform full partitionwise
joins, between three tables in the first and third cases and between
two tables (since that's all there are) in the second case. With the
patch, the first and third cases switch to performing one of the joins
partitionwise and one of the joins non-partitionwise, and the second
case switches to performing a non-partitionwise join. I think this
defeats the point of the test cases, so it probably needs to be fixed
somehow, but I'm not sure what kind of adjustment would be most
appropriate.

I tried adjusting these three queries to use COSTS ON, and compared
the costs with and without the fix:

Q1 fixed: Merge Join (cost=9.98..1417.93 rows=83503 width=16)
Q1 unfixed: Merge Append (cost=10.01..2283.64 rows=83503 width=16)
Q2 fixed: Merge Join (cost=5.96..7.86 rows=3 width=18)
Q2 unfixed: Sort (cost=11.52..11.53 rows=3 width=18)
Q3 fixed: Merge Join (cost=25.09..27.47 rows=1 width=13)
Q3 unfixed: Merge Append (cost=24.94..27.27 rows=3 width=13)

What we see here is that, in the case of Q1, the fix reduces the cost
by a large amount, which is the kind of thing you'd hope would happen
when you fix a costing a bug, although I haven't quite figured out why
we get such a large benefit. Likewise, in the second case, the cost
goes down with the fix, although not by a lot. That case is
interesting because the plan selected with the patch is a merge join
of appends of index scans, which of every possible plan shape is
probably the one that benefits least from being performed
partitionwise. If the merge join involved a sort, you'd expect the
partitionwise approach to win, since several smaller sorts figure to
cost less in total than one big sort; but here it doesn't, so there's
little room for the partitionwise nature of the operation to provide a
benefit, and apparently the planner thinks that, in fact, it doesn't.
But the Q3 change is really the most disturbing part of this -- the
cost actually goes up with the fix. I haven't figured out whether
that's due to some kind of round-off error or whether it's evidence
that the patch doesn't properly fix the bug. I wonder whether
Richard's rewrite of unique-ification requires some adjustment to the
patch.

Another test case failure that would have happened had Ashutosh not
compensated for it in the patch is with this query:

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER
BY x.id ASC LIMIT 10;

Now, currently, this chooses a partitionwise-join. The Limit node at
the top of the plan tree has a cost of 2.22 and the underlying Merge
Append node has a cost of 223.11. But if you apply the fix and not
Ashutosh's adjustments to the test case, then you get a
non-partitionwise join, where the Limit at the top of the plan tree
has a cost of 2.21 and the underlying Merge Append node has a cost of
223.10. Since we're just trying to decide whether to Append some Merge
Joins or Merge Join some Appends, it's not surprising that the costs
are very close. However, I also find it slightly discouraging in terms
of accepting Ashutosh's proposed fix. In the Q1 case, above, we
apparently reduce the cost specifically by not flushing the path list.
But here, we just end up picking a nearly equivalent path with a
nearly-equivalent cost. At least, that means the test case isn't
likely to be stable, and we could just patch around that, as Ashutosh
did, by suppressing partitionwise join (it is not clear whether this
compromises the goals of the test case, but it's not obvious that it
does). But it might also be taken as a worrying indication that plans
of this form are going to come out as either partitionwise or not
based on essentially random factors, which could be viewed as a flaw
in the approach. I'm not really sure which way to view it, and if is a
flaw in the approach, then I'm not sure what to do instead.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Smith 2025-10-28 20:54:37 Re: PG18 GIN parallel index build crash - invalid memory alloc request size
Previous Message Thomas Munro 2025-10-28 20:03:33 Re: C11: should we use char32_t for unicode code points?