Re: Removing unneeded self joins

From: "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2023-10-13 08:56:37
Message-ID: 29ddec15-a83f-4f24-b069-6e93c189b169@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> Also I've incorporated improvements from Alena Rybakina except one for
>> skipping SJ removal when no SJ quals is found.  It's not yet clear for
>> me if this check fix some cases. But at least optimization got skipped
>> in some useful cases (as you can see in regression tests).
>
> Agree. I wouldn't say I like it too. But also, I suggest skipping some
> unnecessary assertions proposed in that patch:
> Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
> negative numbers, at least?
> Assert(is_opclause(orinfo->clause)); - above we skip clauses with
> rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
> checked as is_opclause.
> All these changes (see in the attachment) are optional.
>
I don't mind about asserts, maybe I misunderstood something in the patch.

About skipping SJ removal when no SJ quals is found, I assume it is
about it:

split_selfjoin_quals(root, restrictlist, &selfjoinquals,
                                  &otherjoinquals, inner->relid,
outer->relid);

+            if (list_length(selfjoinquals) == 0)
+             {
+                 /*
+                  * XXX:
+                  * we would detect self-join without quals like 'x==x'
if we had
+                  * an foreign key constraint on some of other quals
and this join
+                  * haven't any columns from the outer in the target list.
+                  * But it is still complex task.
+                  */
+                 continue;
+             }

as far as I remember, this is the place where it is checked that the SJ
list is empty and it is logical, in my opinion, that no transformations
should be performed if no elements are found for them.

As for the cases where SJ did not work, maybe this is just right if
there are no elements for processing these cases. I'll try to check or
come up with tests for them. If I'm wrong, write.

On 11.10.2023 06:51, Andrei Lepikhov wrote:
> On 11/10/2023 02:29, Alena Rybakina wrote:
>> I have reviewed your patch and I noticed a few things.
>
> Thanks for your efforts,
>
>> I have looked at the latest version of the code, I assume that the
>> error lies in the replace_varno_walker function, especially in the
>> place where we check the node by type Var, and does not form any
>> NullTest node.
>
> It's not a bug, it's an optimization we discussed with Alexander above.
>
>> Secondly, I added some code in some places to catch erroneous cases
>> and added a condition when we should not try to apply the
>> self-join-removal transformation due to the absence of an empty
>> self-join list after searching for it and in general if there are no
>> joins in the query. Besides, I added a query for testing and wrote
>> about it above. I have attached my diff file.
> Ok, I will look at this
>> In addition, I found a comment for myself that was not clear to me. I
>> would be glad if you could explain it to me.
>>
>> You mentioned superior outer join in the comment, unfortunately, I
>> didn't find anything about it in the PostgreSQL code, and this
>> explanation remained unclear to me. Could you explain in more detail
>> what you meant?
> I meant here that only clauses pushed by
> reconsider_outer_join_clauses() can be found in the joininfo list, and
> they are not relevant, as you can understand.
> Having written that, I realized that it was a false statement. ;) -
> joininfo can also contain results of previous SJE iterations, look:
>
> CREATE TABLE test (oid int PRIMARY KEY);
> CREATE UNIQUE INDEX ON test((oid*oid));
> explain
> SELECT count(*)
> FROM test c1, test c2, test c3
> WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
> explain
> SELECT count(*)
> FROM test c1, test c2, test c3
> WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
> explain
> SELECT count(*)
> FROM test c1, test c2, test c3
> WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;

Ok, I understood. Thank you for explanation.

--
Regards,
Alena Rybakina

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-10-13 09:02:39 Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
Previous Message Alena Rybakina 2023-10-13 08:39:41 Re: A new strategy for pull-up correlated ANY_SUBLINK