FlattenedSubLink is a crock, it's going away

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: FlattenedSubLink is a crock, it's going away
Date: 2009-02-24 22:21:25
Message-ID: 28996.1235514085@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ still working on the planner issues exposed by Kevin Grittner's complaint ]

I've concluded that my introduction of FlattenedSubLink as a transient
representation for flattened EXISTS/IN clauses was a Bad Idea. The core
problem is that it messes up the delicate order of operations in
deconstruct_recurse(). What we want to do there with an outer join is
to form its SpecialJoinInfo (mainly since we have to determine its
min_lefthand/min_righthand for possible use in placing its quals), then
place its quals, then add the SpecialJoinInfo to join_info_list. The
reason for this ordering is that during distribute_qual_to_rels we want
only *lower* outer joins to be in join_info_list. As the current code
stands, if there are any FlattenedSubLinks in an outer join's quals
they'll be converted to SpecialJoinInfos during the qual processing,
which is too late because they should logically be underneath the outer
join, where they can affect the min_lefthand/min_righthand we compute
for it. CVS HEAD happens to work anyway because the relations involved
in the semi or antijoin were collected into a FromExpr by
pull_up_sublinks, and that causes deconstruct_recurse to (mistakenly!)
treat them as a lower inner join, and so the min_righthand of the upper
outer join gets set in a way that prevents improper join ordering. That
behavior is, um, an unintentional side-effect, and it seems fragile as
heck. What's worse, there are cases it gets wrong anyway: one being
pseudoconstant quals in an EXISTS, for example

regression=# prepare foo(bool) as select count(*) from tenk1 a left join tenk1 b
regression-# on (a.unique2 = b.unique1 and exists
regression(# (select 1 from tenk1 c where c.thousand = b.unique2 and $1));
PREPARE
regression=# execute foo(false);
count
-------
0 -- wrong answer; one-time qual is in wrong place
(1 row)

What I'm now thinking is appropriate is for pull_up_sublinks to
explicitly generate a JoinExpr for each semijoin or antijoin it creates,
and insert this into the jointree at a semantically legal place. This
will make the recursion interfaces within pull_up_sublinks a tad more
complex, which is what I'd hoped to avoid in the first place by
inventing FlattenedSubLink. But I'm seeing that the complexity and
bug potential added elsewhere is worse.

Normally a JoinExpr requires an associated RTE, but that's really only
needed as a reference point for join alias Vars. Since there won't be
any alias Vars for an IN/EXISTS join, I think I can avoid building an
RTE and just set the JoinExpr's rtindex to zero. Nothing outside the
planner is ever going to see such a generated JoinExpr anyway.

Comments, objections?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2009-02-24 22:24:11 Re: Synchronous replication & Hot standby patches
Previous Message Robert Haas 2009-02-24 21:52:38 Re: Synchronous replication & Hot standby patches