Re: Removing unneeded self joins

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2019-03-25 04:07:33
Message-ID: CAKJS1f8cJOCGyoxi7a_LG7eu+WKF9+HTff3wp1KKS5gcUg2Qfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 23 Mar 2019 at 04:13, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov
> > This is a bug indeed. Unique index search is not exhaustive, so if many
> > indexes match the join quals, we might not find the same index for both
> > sides. I think this can be overcome if we switch to exhaustive search in
> > relation_has_unique_index_for, and then try to find matching indexes in
> > is_unique_self_join. Please see the new patch for the fix.
>
> I really don't think modifying relation_has_unique_index_for to
> collect details for up to 5 indexes is a good fix for this. It looks
> like it's still possible to trigger this, just the example would need
> to be more complex. Also, you've likely just more than doubled the
> runtime of a successful match in relation_has_unique_index_for().
> Previously, on average we'd have found that match halfway through the
> list of indexes. Now it's most likely you'll end up looking at every
> index, even after a match has been found. That does not seem well
> aligned to keeping the CPU overhead for the patch low.
>
> What is wrong with just weeding out join quals that don't have the
> equivalent expression on either side before passing them to
> relation_has_unique_index_for()? That'll save you from getting false
> matches like I showed. To make that work it just seems mostly like
> you'd mostly just need to swap the order of operations in the patch,
> but you'd also likely end up needing to rip out all the UniqueRelInfo
> code, since I don't think that'll be needed any longer. Likely that
> means your entire patch would be limited to analyzejoins.c, although
> I'm unsure what of the eclass editing code should be moved into
> equivclass.c.

I've done some minimal modifications to your v12 patch to make it work
the way I described. I ended up splitting out the joinqual list into
two lists; one that contains just self join quals that match on either
side, and the other with the remaining quals. We just pass the self
join matching quals to innerrel_is_unique() so we no longer can trick
it into matching with the wrong index.

I didn't modify remove_self_join_rel(). It could make use of the split
lists instead of checking what we've already checked in
split_selfjoin_quals(). i.e. selfjoinquals get a IS NOT NULL test
added to the basequals and the otherjoinquals have their varnos
changed and then applied to the basequals too.

There was a couple of regression test failures using your version of
the tests. One test just went back to what it was before you changed
the output and the other seems like a missed optimisation in your
version of the patch.

Namely, you didn't remove the self join in a case like:

explain select * from t1 inner join t1 t2 on t1.a=t2.a where t1.b = 1
and t2.b = 2;

but my version does. You had commented the test with:

-- If index conditions are different for each side, we won't select the same
-- row on both sides, so the join can't be removed.

but I don't quite understand why we can't remove the join in this
situation. For this particular case no rows can match, so maybe the
plan should really be the same as what happens when you do:

# explain select * from t1 where b = 1 and b = 2;
QUERY PLAN
-------------------------------------------------------------------------
Result (cost=0.15..8.17 rows=1 width=8)
One-Time Filter: false
-> Index Scan using t1_b_key on t1 (cost=0.15..8.17 rows=1 width=8)
Index Cond: (b = 1)
(4 rows)

For now, it still produces a plan like:

QUERY PLAN
-----------------------------------------------------------------------
Index Scan using t1_b_key on t1 t2 (cost=0.15..8.17 rows=1 width=16)
Index Cond: ((b = 2) AND (b = 1))
Filter: (a IS NOT NULL)
(3 rows)

My implementation of split_selfjoin_quals() still needs work. I only
wrote code to handle simple Vars. There could be Unique indexes on
exprs too, which the current code will fail to detect. I don't think
the join quals can contain vars from higher levels, but I didn't go to
the trouble of checking that in detail. If it's possible then we'll
need to reject those.

I also finished off the renaming of remove_useless_left_joins(). I
didn't go into any detail on the actual removal code.

I've attached a complete patch and also a delta against v12.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v14-remove-unique-self-joins.patch application/octet-stream 61.6 KB
v14-remove-unique-self-joins_delta.patch application/octet-stream 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nagaura, Ryohei 2019-03-25 05:26:25 RE: Timeout parameters
Previous Message Thomas Munro 2019-03-25 04:01:07 Re: Usage of epoch in txid_current