From: | Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Removing unneeded self joins |
Date: | 2023-10-16 08:28:13 |
Message-ID: | 8800ccf2-ae82-461c-92a1-d4c0a22f8cf2@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/10/2023 18:32, Alexander Korotkov wrote:
> On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 4/10/2023 14:34, Alexander Korotkov wrote:
>>> > Relid replacement machinery is the most contradictory code here. We used
>>> > a utilitarian approach and implemented a simplistic variant.
>>>
>>> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
>>> > > they are not necessary. [1] points that infrastructure from [2] might
>>> > > be useful. The patchset from [2] seems committed mow. However, I
>>> > > can't see it is directly helpful in this matter. Could we just skip
>>> > > adding IS NOT NULL clause for the columns, that have
>>> > > pg_attribute.attnotnull set?
>>> > Thanks for the links, I will look into that case.
>> To be more precise, in the attachment, you can find a diff to the main
>> patch, which shows the volume of changes to achieve the desired behaviour.
>> Some explains in regression tests shifted. So, I've made additional tests:
>>
>> DROP TABLE test CASCADE;
>> CREATE TABLE test (a int, b int not null);
>> CREATE UNIQUE INDEX abc ON test(b);
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b;
>> CREATE UNIQUE INDEX abc1 ON test(a,b);
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b;
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
>> DROP INDEX abc1;
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>>
>> We have almost the results we wanted to have. But in the last explain
>> you can see that nothing happened with the OR clause. We should use the
>> expression mutator instead of walker to handle such clauses. But It
>> doesn't process the RestrictInfo node ... I'm inclined to put a solution
>> of this issue off for a while.
>
> OK. I think it doesn't worth to eliminate IS NULL quals with this
> complexity (at least at this stage of work).
>
> I made improvements over the code. Mostly new comments, grammar
> corrections of existing comments and small refactoring.
>
> Also, I found that the suggestion from David Rowley [1] to qsort
> array of relations to faster find duplicates is still unaddressed.
> I've implemented it. That helps to evade quadratic complexity with
> large number of relations.
>
> Also I've incorporated improvements from Alena Rybakina except one for
> skipping SJ removal when no SJ quals is found. It's not yet clear for
> me if this check fix some cases. But at least optimization got skipped
> in some useful cases (as you can see in regression tests).
I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).
--
regards,
Andrey Lepikhov
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
iterative_elimination.diff | text/plain | 4.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ronan Dunklau | 2023-10-16 08:31:32 | Re: LLVM 16 (opaque pointers) |
Previous Message | Nikita Malakhov | 2023-10-16 08:20:58 | Re: remaining sql/json patches |