Re: Removing unneeded self joins

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2020-11-27 16:49:57
Message-ID: 16aa5d06-d8c6-416a-cb71-21663019944e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
> + /*
> + * Process restrictlist to seperate out the self join quals from
> + * the other quals. e.g x = x goes to selfjoinquals and a = b to
> + * otherjoinquals.
> + */
> + split_selfjoin_quals(root, restrictlist, &selfjoinquals,
> + &otherjoinquals);
> +
> + if (list_length(selfjoinquals) == 0)
> + {
> + /*
> + * Have a chance to remove join if target list contains vars from
> + * the only one relation.
> + */
> + if (list_length(otherjoinquals) == 0)
> + {
> + /* Can't determine uniqueness without any quals. */
> + continue;
> +
> + }
> + else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
> + {
> + if (!innerrel_is_unique(root, joinrelids, outer->relids,
> + inner, JOIN_INNER, otherjoinquals,
> + false))
> + continue;
> + }
> + else
> + /*
> + * The target list contains vars from both inner and outer
> + * relations.
> + */
> + continue;
> + }
> +
> + /*
> + * Determine if the inner table can duplicate outer rows. We must
> + * bypass the unique rel cache here since we're possibly using a
> + * subset of join quals. We can use 'force_cache' = true when all
> + * join quals are selfjoin quals. Otherwise we could end up
> + * putting false negatives in the cache.
> + */
> + else if (!innerrel_is_unique(root, joinrelids, outer->relids,
> + inner, JOIN_INNER, selfjoinquals,
> + list_length(otherjoinquals) == 0))
> + continue;

I don't understand the logic here. If 'selfjoinquals' is empty, it means
that there is no join qual between the two relations, right? How can we
ever remove the join in that case? And how does the target list affect
that? Can you give an example query of that?

> --- a/src/test/regress/expected/join.out
> +++ b/src/test/regress/expected/join.out
> @@ -4553,11 +4553,13 @@ explain (costs off)
> select p.* from
> (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
> where p.k = 1 and p.k = 2;
> - QUERY PLAN
> ---------------------------
> + QUERY PLAN
> +------------------------------------------------
> Result
> One-Time Filter: false
> -(2 rows)
> + -> Index Scan using parent_pkey on parent x
> + Index Cond: (k = 1)
> +(4 rows)
>
> -- bug 5255: this is not optimizable by join removal
> begin;

That doesn't seem like an improvement...

My general impression of this patch is that it's a lot of code to deal
with a rare special case. At the beginning of this thread there was
discussion on the overhead that this might add to planning queries that
don't benefit, but adding a lot of code isn't nice either, even if the
performance is acceptable. That's not necessarily a show-stopper, but it
does make me much less excited about this. I'm not sure what to suggest
to do about that, except a really vague "Can you make is simpler?"

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-11-27 16:53:07 Re: remove spurious CREATE INDEX CONCURRENTLY wait
Previous Message Stephen Frost 2020-11-27 16:28:20 Re: A few new options for CHECKPOINT