Re: Removing unneeded self joins

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2023-06-30 10:50:19
Message-ID: c97574c4-13e6-9107-26ab-af2bd626fc85@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/4/2023 02:30, Gregory Stark (as CFM) wrote:
> On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek <michal(at)kleczek(dot)org> wrote:
>>
>> Hi All,
>>
>> I just wanted to ask about the status and plans for this patch.
>> I can see it being stuck at “Waiting for Author” status in several commit tests.
>
> Sadly it seems to now be badly in need of a rebase. There are large
> hunks failing in the guts of analyzejoins.c as well as minor failures
> elsewhere and lots of offsets which need to be reviewed.
>
> I think given the lack of activity it's out of time for this release
> at this point. I'm moving it ahead to the next CF.
Hi,

Version 41 is heavily remade of the feature:

1. In previous versions, I tried to reuse remove_rel_from_query() for
both left and self-join removal. But for now, I realized that it is a
bit different procedures which treat different operations. In this
patch, only common stages of the PlannerInfo fixing process are united
in one function.
2. Transferring clauses from the removing table to keeping one is more
transparent now and contains comments.
3. Equivalence classes update procedure was changed according to David's
commit 3373c71. As I see, Tom has added remove_rel_from_eclass since the
last v.40 version, and it looks pretty similar to the update_eclass
routine in this patch.

It passes regression tests, but some questions are still open:
1. Should we look for duplicated or redundant clauses (the same for
eclasses) during the clause transfer procedure? On the one side, we
reduce the length of restrict lists that can impact planning or
executing time. Additionally, we improve the accuracy of cardinality
estimation. On the other side, it is one more place that can make
planning time much longer in specific cases. It would have been better
to avoid calling the equal() function here, but it's the only way to
detect duplicated inequality expressions.
2. Could we reuse ChangeVarNodes instead of sje_walker(), merge
remove_rel_from_restrictinfo with replace_varno?
3. Also, I still don't finish with the split_selfjoin_quals: some
improvements could be made.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
v41-0001-Remove-self-joins.patch text/plain 92.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2023-06-30 11:08:45 Re: EBCDIC sorting as a use case for ICU rules
Previous Message Palak Chaturvedi 2023-06-30 10:46:50 Extension Enhancement: Buffer Invalidation in pg_buffercache