Re: apply_scanjoin_target_to_paths and partitionwise join

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, arne(dot)roland(at)malkut(dot)net, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join
Date: 2025-10-31 18:40:19
Message-ID: CA+TgmoYXDZyVTU8_ibrh+LtQ5ECgw8CL-HgMHpLLXpshmSf7AA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 31, 2025 at 8:21 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > Do you have any thoughts on how we might adjust these test cases?
>
> For Q1, the plan change does not appear to compromise its original
> purpose. The test case is meant to verify that unique-ification works
> correctly with partitionwise joins, so as long as the join to t3 is
> performed in a partitionwise manner, we're fine.
>
> For Q2, I suspect that the cost estimation issue in mergejoin may be
> affecting the planner's choice of plan. When I set enable_mergejoin
> to off, I was able to get a partitionwise join plan again. Therefore,
> I think we can modify the test case to manually disable mergejoin for
> this query.
>
> For Q3, you can get a plan with full partitionwise joins again by
> removing either the clause "t1.c = 0" or "t1.b = 0", and doing so
> doesn't change the query's output. You can also get a fully
> partitionwise join plan by removing both clauses, though in that case
> the query output becomes too large to include in a test case.

Thank you, this is extremely helpful and I very much appreciate you
putting in the time to assemble these recommendations. I adjusted
these test cases as per these suggestions. I then realized that we
also need to examine the two test cases that the existing patch
already adjusted. In one of those cases, Ashutosh simply disabled
partitionwise joins, but looking up a little higher in the file, I saw
from the comments that the test case was intended to test a
partitionwise join, so that didn't seem like a good solution. Instead,
I adjusted the test case to run with merge joins turned off, as you
suggested for Q2. The resulting plan is slightly different from the
current one, but it seems like it is still testing what the original
author intended to test. But there's still one problematic case, which
is different from all the ones we've examined so far:

EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2
ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0)
% 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;

Currently, this produces a partitionwise join with each sub-join being
implemented as a hash join. With the patch, it switches to using a
non-partitionwise hash join. It appears to me that the issue is that
the cardinality estimation is less accurate when planning in
non-partitionwise fashion. If we disregard the WHERE clause for the
moment for simplicity and just look at the joins, they produce a total
of 305 rows: 55 from the first set of parittions, and 250 from the
second set of partitions. With partitionwise join, we estimate that
the join between the first pair of partitions will produce 42 rows and
the second set is perfectly estimated at 250 for a total estimate of
292. Without partitionwise join, we instead estimate 225 rows. The
planner naturally estimates that processing fewer rows will be less
expensive than processing more of them, and so picks the
non-partitionwise plan.

I think this is the only case we've seen so far where the proposed fix
could lead to a regression. In the first type of case, where running a
lot of tuples through an Append or Merge Append node, the planner is
correct to consider abandoning partitionwise join if so doing will
ameliorate that problem to a sufficient degree. In the second type of
case, where a non-partitionwise plan is chosen because we realize that
a Merge Join could stop early, I don't think it should make a whole
lot of difference. We'll only believe we can stop reading one side of
the input early if there's no Sort node present, and as discussed
earlier, Merge Joins without Sorts don't really benefit from being
done partitionwise; it's kind of six of one, half a dozen of the
other. Maybe there's some subtly here that I haven't quite understood,
but it doesn't seem like this case is a big problem. However, in this
situation, the proposed fix really could mess somebody up: there is
ever reason to believe that the partitionwise plan is strictly better
than the non-partitionwise plan, and we only choose the
non-partitionwise plan due to bad estimation. That seems like it could
be a common scenario, because I think we should expect that
partitionwise estimates will, on average, be more estimate than
non-partitionwise estimates.

Said differently, this kind of scenario where we don't really want the
non-partitionwise path to win even if it looks cheaper on paper. I
don't really know what to do about that, though. The only way to
improve the non-partitionwise estimate would be to do partitionwise
estimation all the time, regardless of whether we think we're going to
do a partitionwise join. Maybe that's a good idea and maybe it isn't,
but even if it is, it's probably about four orders of magnitude more
code change than this patch is presently contemplating, so linking the
two of them doesn't seem like a good plan. One option is to just go
ahead and commit the fix as proposed, and hope that regressions aren't
too common. The only other idea that I have at the moment is to invent
something like enable_partitionwise_join = { off | on | always }, thus
giving a user who is harmed by this change a way to restore the
current behavior.

Thoughts?

(The attached patch leaves this problematic test case failing and
makes the others pass as per the above discussion.)

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Don-t-reset-the-pathlist-of-partitioned-joinrels.patch application/octet-stream 15.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-10-31 19:02:05 Re: postgres_fdw: Use COPY to speed up batch inserts
Previous Message Heikki Linnakangas 2025-10-31 18:14:38 Re: meson's in-tree libpq header search order vs -Dextra_include_dirs