Removing unneeded self joins

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Removing unneeded self joins
Date: 2018-05-16 15:43:33
Message-ID: 64486b0b-0404-e39e-322d-0801154901f3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

There is a join optimization we don't do -- removing inner join of a
table with itself on a unique column. Such joins are generated by
various ORMs, so from time to time our customers ask us to look into
this. Most recently, it was discussed on the list in relation to an
article comparing the optimizations that some DBMS make [1].

I started to explore what can be done about this. Attached is a proof of
concept patch. It works for some simple cases:

create table tt(a int primary key, b text);
explain select p.* from tt p join (select * from tt where b ~~ 'a%') q
on p.a = q.a;

                      QUERY PLAN
──────────────────────────────────────────────────────
 Seq Scan on tt p  (cost=0.00..25.88 rows=6 width=36)
   Filter: (b ~~ 'a%'::text)

It also works for semi-joins like `explain select p.* from tt p where
exists (select * from tt where b ~~ 'a%' and a = p.a);`. This requires a
preparatory step of reducing unique semi joins to inner joins, and we
already do this (reduce_unique_semijoin).

What this patch tries to do is to remove these inner joins when a single
join is being planned (populate_joinrel_with_paths). The main entry
point is reduce_self_unique_join. First, it proves that both input
relations are uniquely constrained by the same index given the
particular join clauses. We already have a way to find such indexes
(relation_has_unique_index_for), so I was able to reuse this. What I'm
not sure about is how to properly remove the join after that. For now, I
just pretend that the join relation being built is the outer baserel,
add to it the restrictions from the inner relation, and then plan it as
usual. Maybe there is a less hacky way to do it? I've seen elsewhere a
suggestion to use an AppendPath for a similar purpose, but here we can't
just use the outer relation we've already planned because the
restriction list is different.

I'd be glad to hear your thoughts on this.

[1]
https://www.postgresql.org/message-id/flat/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw%40mail(dot)gmail(dot)com#CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw(at)mail(dot)gmail(dot)com

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
remove-self-join-v1.patch text/x-patch 18.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-05-16 15:54:13 Re: libpq compression
Previous Message Heikki Linnakangas 2018-05-16 15:31:28 Re: Considering signal handling in plpython again