Re: Pathify RHS unique-ification for semijoin planning

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Subject: Re: Pathify RHS unique-ification for semijoin planning
Date: 2025-07-04 01:41:35
Message-ID: CAMbWs49+V3m8ghSDUyUBEziXhBgfRZ8GCLu-kWZqGpiXW8i=Bw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 3, 2025 at 7:06 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> This patch does not apply again, so here is a new rebase.
>
> This version also fixes an issue related to parameterized paths: if
> the RHS has LATERAL references to the LHS, unique-ification becomes
> meaningless because the RHS depends on the LHS, and such paths should
> not be generated.

(The cc list is somehow lost; re-ccing.)

FWIW, I noticed that the row/cost estimates for the unique-ification
node on master can be very wrong. For example:

create table t(a int, b int);
insert into t select i%100, i from generate_series(1,10000)i;
vacuum analyze t;
set enable_hashagg to off;

explain (costs on)
select * from t t1, t t2 where (t1.a, t2.b) in
(select a, b from t t3 where t1.b is not null offset 0);

And look at the snippet from the plan:

(on master)
-> Unique (cost=934.39..1009.39 rows=10000 width=8)
-> Sort (cost=271.41..271.54 rows=50 width=8)
Sort Key: "ANY_subquery".a, "ANY_subquery".b
-> Subquery Scan on "ANY_subquery" (cost=0.00..270.00
rows=50 width=8)

The row estimate for the subpath is 50, but it increases to 10000
after unique-ification. How does that make sense?

This issue does not occur with this patch:

(on patched)
-> Unique (cost=271.41..271.79 rows=50 width=8)
-> Sort (cost=271.41..271.54 rows=50 width=8)
Sort Key: "ANY_subquery".a, "ANY_subquery".b
-> Subquery Scan on "ANY_subquery" (cost=0.00..270.00
rows=50 width=8)

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2025-07-04 02:12:58 Re: Improve error message for duplicate labels in enum types
Previous Message Andy Fan 2025-07-04 00:26:55 Re: parallel safety of correlated subquery