Re: PlaceHolderVars versus join ordering

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PlaceHolderVars versus join ordering
Date: 2010-09-27 17:27:35
Message-ID: AANLkTimsMFQOHv=ihVUHwXu-O0okh1QQK6wKwLihjpK4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 27, 2010 at 1:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've been looking at the bug reported by Kirill Simonov here:
> http://archives.postgresql.org/pgsql-bugs/2010-09/msg00265.php
> This is pretty ugly :-(.  What we have is
>        select * from A
>        left join (select some-expressions from B left join C ...)
> The expressions are not guaranteed nullable; that is, they won't
> automatically produce nulls from all-null rows for B and C.
> This means that we have to evaluate the expressions at the join of
> B and C, producing PlaceHolderVars, which then bubble up through
> the left join with A, and will get replaced with nulls in any
> null-extended row of that join.  Then references to them in the final
> select list will correctly show as nulls.
>
> The problem is that there isn't anything guaranteeing that the B/C join
> will be formed before joining to A.  In Kirill's example the planner
> thinks it should commute the two joins, and then it evaluates the
> PlaceHolderVars at the now-upper B/C join, meaning they fail to go to
> nulls in rows where they should.
>
> This is a fundamental oversight in the original design of the
> PlaceHolderVar mechanism.  I'm not sure how come I didn't see that some
> sort of join-order interlock was necessary, but I didn't.
>
> The most trustworthy fix that I can think of goes like this:
>
> 1. Add an extra pass over the parsetree early in query_planner() to
> locate PlaceHolderVars, set up the PlaceHolderInfo list, and remember the
> syntactic level of each actually-used PlaceHolderVar.  It's slightly
> annoying to have to make an extra tree traversal for this, but at least
> we can skip it in the common case where no PlaceHolderVars were created.
> (Currently, this processing happens as a side-effect during
> distribute_qual_to_rels, but that's too late for what we need to do in
> point 2.  Also, making the PlaceHolderInfos immediately when the
> PlaceHolderVars are created would be unpalatable since we don't know at
> that stage whether the PlaceHolderVars are actually used above the
> relevant outer join.)
>
> 2. In make_outerjoininfo, scan the PlaceHolderInfo list to see if there
> are any placeholders that should be evaluated below the current outer join
> and are needed above it.  If so, enlarge the outer join's min_righthand
> set to ensure that it won't get moved below where the placeholder has to
> be evaluated.  We can fold the existing fix_placeholder_eval_levels()
> logic into this operation, which will make that code less fragile than it
> is now.
>
> This is a larger change than I would prefer to back-patch, but the only
> less-invasive alternative I can see is to lobotomize the PlaceHolderVar
> mechanism entirely by reverting to 8.3-style logic wherein we prevented
> pullup of sub-selects that would require introduction of placeholders.
> That would undo a significant optimization feature of 8.4, one that
> I believe we're now relying on for reasonable performance of some system
> views.
>
> Thoughts, better ideas?

Personally, I would rather back-patch a more invasive bug fix than a
performance regression.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-09-27 17:33:05 Re: client socket TIME_WAIT state after PQfinish
Previous Message Tom Lane 2010-09-27 17:21:39 Re: Improving prep_buildtree used in VPATH builds