Re: Pathify RHS unique-ification for semijoin planning

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pathify RHS unique-ification for semijoin planning
Date: 2025-05-28 01:58:27
Message-ID: CAMbWs4-a2BRgVdYUdMbUVM07nzCnctu23ErK+ye+YD6+9JHi-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 22, 2025 at 4:05 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> Therefore, I'm thinking that maybe we could create a new RelOptInfo
> for the RHS rel to represent its unique-ified version, and then
> generate all worthwhile paths for it, similar to how it's done in
> create_distinct_paths(). Since this is likely to be called repeatedly
> on the same rel, we can cache the new RelOptInfo in the rel struct,
> just like how we cache cheapest_unique_path currently.
>
> To be concrete, I'm envisioning something like the following:
>
> if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
> - create_unique_path(root, rel2, rel2->cheapest_total_path,
> - sjinfo) != NULL)
> + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL)
>
> ...
>
> - add_paths_to_joinrel(root, joinrel, rel1, rel2,
> - JOIN_UNIQUE_INNER, sjinfo,
> + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique,
> + JOIN_INNER, sjinfo,
> restrictlist);
> - add_paths_to_joinrel(root, joinrel, rel2, rel1,
> - JOIN_UNIQUE_OUTER, sjinfo,
> + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1,
> + JOIN_INNER, sjinfo,
> restrictlist);

Here is a WIP draft patch based on this idea. It retains
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to help determine whether the
inner relation is provably unique, but otherwise removes most of the
code related to these two join types.

Additionally, the T_Unique path now has the same meaning for both
semijoins and DISTINCT clauses: it represents adjacent-duplicate
removal on presorted input. This patch unifies their handling by
sharing the same data structures and functions.

There are a few plan diffs in the regression tests. As far as I can
tell, the changes are improvements. One of them is caused by the fact
that we now consider parameterized paths in unique-ified cases. The
rest are mostly a result of now preserving pathkeys for unique paths.

This patch is still a work in progress. Before investing too much
time into it, I'd like to get some feedback on whether it's heading in
the right direction.

Thanks
Richard

Attachment Content-Type Size
v1-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patch application/octet-stream 83.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2025-05-28 02:26:57 Re: Standardize the definition of the subtype field of AlterDomainStmt
Previous Message Fujii Masao 2025-05-28 01:10:12 Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables