Re: Removing unneeded self joins

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2019-03-14 11:21:29
Message-ID: CAKJS1f8VV4YvsC_YSNHVtKfmqTAnrfcDoH21gVgAJNAFmwK6zA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 1 Mar 2019 at 03:09, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> New version is attached.

I started looking over v11 and I'm wondering why you really need to
know which unique index proved the join unique?

I removed that check and I see it causes the following to remove the self-join:

create unique index on t1(a);
create unique index on t1(b);
explain select * from t1 inner join t1 t2 on t1.a=t2.b;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 t2 (cost=0.00..38.25 rows=11 width=16)
Filter: (a = b)
(2 rows)

I had thought that the loop in is_unique_self_join() would have
rejected this, but looking more closely it's only checking the list of
join quals matched to each index match each other. Shouldn't it be
testing that the expressions on either side of the OpExprs are the
same after aligning the varnos? At the moment you're testing
innerrel_is_unique() for either side of the join, if you made this
change then couldn't you just test one rel and look for a unique index
on the join quals that match both sides of the join at the same time?
Probably relation_has_unique_index_for() is not the right tool for
that job, so you might need to write another function that does the
additional checks. Maybe it would be possible to split out some of the
code into helper functions so you don't have to copy the bulk of it.

The reason I mention this is that rel_is_distinct_for() also
considered subqueries. This allows the LEFT JOIN removal code to
remove joins like:

SELECT t1.* FROM t1 LEFT JOIN (SELECT DISTINCT a FROM t2) t2 on t1.a=t2.a;

In this case, the DISTINCT clause was used for unique proofs, not a
unique index.

It does not seem unreasonable that someone one day might want to
extend the self join removal code to do the same, e.g:

SELECT * FROM t1 INNER JOIN (SELECT DISTINCT a FROM t1) t2 on t1.a = t2.a;

and you're not setting it up to be very easy to do that because you're
insisting the proofs are unique indexes only.

What do you think?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-14 11:45:22 Re: Using the return value of strlcpy() and strlcat()
Previous Message Dagfinn Ilmari Mannsåker 2019-03-14 11:10:58 Re: Using the return value of strlcpy() and strlcat()