Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Date: 2023-05-12 17:25:10
Message-ID: 1526312.1683912310@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I see that what you've been doing here is basically trying to rescue my
previous patch attempt. I'm not in love with that approach anymore.
I think a better fix is to do what I suggested a bit upthread: forcibly
mark upper "C" Vars with both joins in varnullingrels, so that the
planner is always working with upper quals that look like they'd come
from input written according to the second form of the identity.
Here's a patch series that does it that way.

Although this passes check-world for me, there are some loose ends
that could use review:

* Does update_commutable_vars() update everything it needs to?
I modeled it on prepjointree's perform_pullup_replace_vars(), but
since this is happening later during planning there are additional
derived data structures it needs to update. Did I miss any?
Also, is it really necessary to update parse->targetList, or would
updating root->processed_tlist be sufficient at this point?

* As implemented, we'll invoke update_commutable_vars() once per
commutable pair of joins, assuming that they are written in the
first form of identity 3 (which'd be the common case). Is that
costly enough to justify inventing some way to combine the updates
for all such pairs into one querytree traversal?

* I changed the API of add_nulling_relids() so that it could support
updating an RTE_SUBQUERY query. This breaks the possibility of applying
it to the whole main Query (it would make the wrong choice of initial
levelsup), but we never do that. Would it be better to invent an
alternate entry point so that both cases could be supported? Should
remove_nulling_relids be changed similarly?

Also, while checking the test cases shown upthread, I was interested to
notice that this code actually chooses a different join order in some
cases. For the example at [1], v15 does

Nested Loop Left Join (cost=0.00..4.46 rows=16 width=4)
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=1)
-> Materialize (cost=0.00..3.26 rows=8 width=3)
-> Nested Loop Left Join (cost=0.00..3.22 rows=8 width=3)
Join Filter: t3.x
-> Nested Loop Left Join (cost=0.00..2.09 rows=4 width=2)
Join Filter: t2.x
-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=1)
-> Materialize (cost=0.00..1.03 rows=2 width=1)
-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=1)
-> Materialize (cost=0.00..1.03 rows=2 width=1)
-> Seq Scan on t4 (cost=0.00..1.02 rows=2 width=1)

but with this patch we do

Nested Loop Left Join (cost=0.00..4.45 rows=16 width=4)
Join Filter: t3.x
-> Nested Loop Left Join (cost=0.00..3.22 rows=8 width=3)
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=1)
-> Materialize (cost=0.00..2.11 rows=4 width=2)
-> Nested Loop Left Join (cost=0.00..2.09 rows=4 width=2)
Join Filter: t2.x
-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=1)
-> Materialize (cost=0.00..1.03 rows=2 width=1)
-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=1)
-> Materialize (cost=0.00..1.03 rows=2 width=1)
-> Seq Scan on t4 (cost=0.00..1.02 rows=2 width=1)

This is correct AFAICS, and it's slightly cheaper, so an optimistic
conclusion would be that we're making a cost-based decision to use a plan
shape that the old code could not find for some reason. But these costs
are pretty close, within the "fuzz factor", so this might be a random
effect. It might be interesting to see if inserting unequal amounts
of data in the tables would drive the costs further apart, allowing
us to say that this code really does beat v15 in terms of planning
flexibility.

regards, tom lane

[1] https://www.postgresql.org/message-id/CAMbWs4-_KdVEJ62o6KbtA%2B_KJnQa7WZCc48VsPQ9in6TSN0Kxg%40mail.gmail.com

Attachment Content-Type Size
v4-0001-Split-RestrictInfo.commute_below-into-LHS-and-RHS.patch text/x-diff 10.0 KB
v4-0002-Fix-mishandling-of-quals-above-a-commuted-pair-of.patch text/x-diff 20.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-05-12 17:49:07 Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Previous Message Tom Lane 2023-05-12 13:39:00 Re: BUG #17930: Regression in DISABLE TRIGGER ALL