Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Date: 2023-05-15 07:05:09
Message-ID: CAMbWs4-doWdYa3AJdw4wMAn7ApSSyRDQrWuFV1_xW5muY0mLJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, May 13, 2023 at 1:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I see that what you've been doing here is basically trying to rescue my
> previous patch attempt. I'm not in love with that approach anymore.
> I think a better fix is to do what I suggested a bit upthread: forcibly
> mark upper "C" Vars with both joins in varnullingrels, so that the
> planner is always working with upper quals that look like they'd come
> from input written according to the second form of the identity.
> Here's a patch series that does it that way.

I explored this approach before and found that there are several
problems with it that we need to resolve to make it work [1]. I
reviewed the patches here and it seems that the second problem still
exists. Quote it here.

* We need to revise how we check if identity 3 applies and if so how we
adjust min_lefthand in make_outerjoininfo. Consider

A leftjoin B on (Pab) leftjoin C on (Pbc) leftjoin D on (Pcd)

So now the C Vars in Pcd are marked with A/B join and B/C join since the
two joins can commute. When it comes to check if C/D join can commute
with B/C join, the A/B join from C Vars' nullingrels would make us think
that C/D join cannot commute with B/C join, which is not correct.
Besides, C/D join's min_lefthand including A/B join is also problematic.

This problem would cause us to be unable to find some join orders. Here
is an example.

create table t (a int, b int);
insert into t select i, i from generate_series(1,1000)i;
analyze t;

explain (costs on)
select * from t t1
left join t t2 on t1.a = t2.a
left join t t3 on t2.a != t3.a
left join t t4 on t3.a = t4.a;

v15 does

QUERY PLAN
--------------------------------------------------------------------------------
Nested Loop Left Join (cost=55.00..15115.00 rows=999000 width=32)
Join Filter: (t2.a <> t3.a)
-> Hash Left Join (cost=27.50..56.25 rows=1000 width=16)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=27.50..61.25 rows=1000 width=16)
-> Hash Left Join (cost=27.50..56.25 rows=1000 width=16)
Hash Cond: (t3.a = t4.a)
-> Seq Scan on t t3 (cost=0.00..15.00 rows=1000 width=8)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t4 (cost=0.00..15.00 rows=1000
width=8)

but with this patch we do

QUERY PLAN
--------------------------------------------------------------------------------
Hash Left Join (cost=55.00..28837.50 rows=999000 width=32)
Hash Cond: (t3.a = t4.a)
-> Nested Loop Left Join (cost=27.50..15073.75 rows=999000 width=24)
Join Filter: (t2.a <> t3.a)
-> Hash Left Join (cost=27.50..56.25 rows=1000 width=16)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000
width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t3 (cost=0.00..15.00 rows=1000 width=8)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t4 (cost=0.00..15.00 rows=1000 width=8)

We could not find the order (t1/t2)/(t3/t4) because we think that t3/t4
join can not commute with t2/t3. As a result we get a much more
expensive plan (28837.50 VS 15115.00).

Also, it seems that PHVs in outer join's target list are not handled
correctly, check the query below which causes assertion failure in
search_indexed_tlist_for_phv.

# select * from t t1
left join (select now() from t t2
left join t t3 on t2.a = t3.a
left join t t4 on t3.a = t4.a) s on true
inner join t t5 on true;
server closed the connection unexpectedly

BTW, there is a typo in the 0001's subject. It should be
SpecialJoinInfo.commute_below rather than RestrictInfo.commute_below.

[1]
https://www.postgresql.org/message-id/CAMbWs4_Bt2YV-geNmLHyS6hmEbkX4C7rV4miaiMUK7%2Be2-gJfQ%40mail.gmail.com

Thanks
Richard

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2023-05-15 08:00:00 Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns
Previous Message PG Bug reporting form 2023-05-15 04:41:12 BUG #17931: Faild to stop PostgresSQL Service