Re: Removing unneeded self joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2020-09-23 04:23:27
Message-ID: CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> v.23 in attachment:

Hi,

This is only a partial review as I see the patch still does not
incorporate the self join detection method I mentioned in March 2019
and the one the patch only partially works.

Here's the partial review which contains the details:

1. Change to aset.c not relevant to this patch.

--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)

chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/

2. Why GUC_NOT_IN_SAMPLE?

+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE

3. I'd expect this to remove the join.

create table t1 (
a int not null unique,
b int not null unique,
c int not null unique,
d int not null,
e int not null
);

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.b=t2.c;
QUERY PLAN
----------------------------------------------------------------------
Hash Join (cost=52.50..88.42 rows=1 width=40)
Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.c))
-> Seq Scan on t1 (cost=0.00..27.00 rows=1700 width=20)
-> Hash (cost=27.00..27.00 rows=1700 width=20)
-> Seq Scan on t1 t2 (cost=0.00..27.00 rows=1700 width=20)
(5 rows)

This one seems to work.

This one *does* seem to work.

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.d=t2.e;

You should likely implement the method I wrote in the final paragraph in [1]

The basic idea here is you first process the join quals:

t1.a=t2.a and t1.b=t2.c

and keep just the ones that use the same varattno on either side
(t1.a=t2.a). Perhaps it's not worth thinking about Exprs for now, or
if you think it is you can normalise the varnos to 1 and equal()

Then just pass the remaining quals to relation_has_unique_index_for().
If it returns true then there's no need to perform the opposite check
for the other relation. It would just return the same thing.

This will allow you to get rid of using the following as proofs:

/* We must have the same unique index for both relations. */
if (outerinfo->index->indexoid != innerinfo->index->indexoid)
return false;

... as I've shown above. This only works in some cases and that's not
really good enough.

Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.

4. Should be ok for partitioned tables though:

/*
* This optimization won't work for tables that have inheritance
* children.
*/
if (rte->inh)
continue;

David

[1] https://www.postgresql.org/message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message k.jamison@fujitsu.com 2020-09-23 04:23:29 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Thomas Munro 2020-09-23 04:11:42 Re: Syncing pg_multixact directories