Re: A problem in deconstruct_distribute_oj_quals

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A problem in deconstruct_distribute_oj_quals
Date: 2023-02-07 08:08:21
Message-ID: CAMbWs49cGJCNTuunE=T9E=e5J12q4WZ_ksDXTYpQiaWLXjbAaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 7, 2023 at 2:12 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > I noticed this code because I came across a problem with a query as
> > below.
>
> > create table t (a int);
>
> > select t1.a from (t t1 left join t t2 on true) left join (t t3 left join
> t
> > t4 on t3.a = t4.a) on t2.a = t3.a;
>
> > When we deal with qual 't2.a = t3.a', deconstruct_distribute_oj_quals
> > would always add the OJ relid of t3/t4 into its required_relids, due to
> > the code above, which I think is wrong. The direct consequence is that
> > we would miss the plan that joins t2 and t3 directly.
>
> I don't see any change in this query plan when I remove that code, so
> I'm not sure you're explaining your point very well.

Sorry I didn't make myself clear. The plan change may not be obvious
except when the cheapest path happens to be joining t2 and t3 first and
then joining with t4 afterwards. Currently HEAD would not generate such
a path because the joinqual of t2/t3 always has the OJ relid of t3/t4 in
its required_relids.

To observe an obvious plan change, we can add unique constraint for 'a'
and look how outer-join removal works.

alter table t add unique (a);

-- with that code
# explain (costs off)
select t1.a from (t t1 left join t t2 on true) left join (t t3 left join t
t4 on t3.a = t4.a) on t2.a = t3.a;
QUERY PLAN
---------------------------------------------------
Nested Loop Left Join
-> Seq Scan on t t1
-> Nested Loop Left Join
-> Seq Scan on t t2
-> Index Only Scan using t_a_key on t t3
Index Cond: (a = t2.a)
(6 rows)

-- without that code
# explain (costs off)
select t1.a from (t t1 left join t t2 on true) left join (t t3 left join t
t4 on t3.a = t4.a) on t2.a = t3.a;
QUERY PLAN
------------------------------
Nested Loop Left Join
-> Seq Scan on t t1
-> Materialize
-> Seq Scan on t t2
(4 rows)

This is another side-effect of that code. The joinqual of t2/t3 is
treated as being pushed down when we try to remove t2/t3, because its
required_relids, which incorrectly includes the OJ relid of t3/t4,
exceed the scope of the join. This is not right.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2023-02-07 08:19:37 Re: make_ctags: use -I option to ignore pg_node_attr macro
Previous Message Tatsuo Ishii 2023-02-07 07:52:29 Re: make_ctags: use -I option to ignore pg_node_attr macro