Re: apply_scanjoin_target_to_paths and partitionwise join

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Arne Roland <arne(dot)roland(at)malkut(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join
Date: 2025-10-30 15:52:38
Message-ID: CA+TgmoZ6zmGo9r4EXi3WEzUZiFr=cqH89Hse+TGmCYU-bR=q5w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 29, 2025 at 9:23 PM Arne Roland <arne(dot)roland(at)malkut(dot)net> wrote:
> The main factor of your example is, that the amount of rows handled by the (Merge) Append is different.

Right. Although that's the main thing here, I am inclined to suspect
there are other ways to hit this problem, maybe ways that are more
likely to happen in the real world, because...

> My second sentence just captured the mundane observation, if the join has significantly more tuples, than any base relation, the place of the (Merge) Append might be more relevant. If I join everything with a generate_series(1, 30000) I get more tuples to process.

...as you imply, joins that inflate the row count are somewhat
uncommon. They definitely do happen, but they're not the most typical
pattern, and there might well be other reasons why a partitionwise
join fails to win that we haven't figured out yet. These could even be
cases where, for example, a certain optimization that works in the
non-partitionwise case is not preserved in the partitionwise case. I
feel like I now understand *one* case where Ashutosh's patch can make
a demonstrable positive difference, but whether that's the only case
that exists seems quite uncertain.

> I'd like to make one more side note about this example: The planner punishes the partitionwise join for having an extra node, that emits N rows (three Hash joins + Append vs two Appends + Hash Join). This plan is chosen because of the cpu_tuple_cost. I'm happy it picks the plan with the smaller memory footprint, but in my real world experience for a timing based approach the default cpu_tuple_cost tends to be too high to get a fair comparison between partitionwise and non partitionwise joins.

Have you localized the problem to cpu_tuple_cost specifically, vs.
cpu_index_tuple_cost or cpu_operator_cost? I've generally found that I
need to reduce random_page_cost and seq_page_cost significantly to
avoid getting sequential scans when index scans would be more
reasonable, but that goes in the opposite direction as what you
suggest here, in that it brings the I/O and CPU costs closer together,
whereas your suggestion would push them further apart. I remember that
Kevin Grittner used to say that the default value of this parameter
was bad, too, but he recommended *raising* it:

https://www.postgresql.org/message-id/1385148245.49487.YahooMailNeo%40web162904.mail.bf1.yahoo.com
https://www.postgresql.org/message-id/4FF179780200002500048CBD@gw.wicourts.gov
https://www.postgresql.org/message-id/CACjxUsNp4uEx3xsunw4wVpBDVomas7o6hnv_49bSbaz-HAVdyA%40mail.gmail.com

I don't actually know what's best in terms of settings in this area. I
don't have experience tuning for partitionwise join specifically.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2025-10-30 16:17:00 Re: POC: make mxidoff 64 bits
Previous Message Nico Williams 2025-10-30 15:32:23 Re: Channel binding for post-quantum cryptography