Re: Removing unneeded self joins

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2023-10-12 11:32:27
Message-ID: CAPpHfdsQkY0Gu7bQ9Ekdyzo_5Xxz74TYKbwmk1on1WAASpxXcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 4/10/2023 14:34, Alexander Korotkov wrote:
> > > Relid replacement machinery is the most contradictory code here. We used
> > > a utilitarian approach and implemented a simplistic variant.
> >
> > > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > > > they are not necessary. [1] points that infrastructure from [2] might
> > > > be useful. The patchset from [2] seems committed mow. However, I
> > > > can't see it is directly helpful in this matter. Could we just skip
> > > > adding IS NOT NULL clause for the columns, that have
> > > > pg_attribute.attnotnull set?
> > > Thanks for the links, I will look into that case.
> To be more precise, in the attachment, you can find a diff to the main
> patch, which shows the volume of changes to achieve the desired behaviour.
> Some explains in regression tests shifted. So, I've made additional tests:
>
> DROP TABLE test CASCADE;
> CREATE TABLE test (a int, b int not null);
> CREATE UNIQUE INDEX abc ON test(b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> CREATE UNIQUE INDEX abc1 ON test(a,b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> DROP INDEX abc1;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>
> We have almost the results we wanted to have. But in the last explain
> you can see that nothing happened with the OR clause. We should use the
> expression mutator instead of walker to handle such clauses. But It
> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> of this issue off for a while.

OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

Links
1. https://www.postgresql.org/message-id/CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ%40mail.gmail.com
2. https://www.postgresql.org/message-id/96f66ae3-df10-4060-9844-4c9633062cd3%40yandex.ru

------
Regards,
Alexander Korotkov

Attachment Content-Type Size
0001-Remove-useless-self-joins-v44.patch application/octet-stream 100.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-10-12 11:41:01 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Amit Kapila 2023-10-12 11:14:09 pg_upgrade's interaction with pg_resetwal seems confusing