Re: Improving planner variable handling

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Improving planner variable handling
Date: 2008-04-16 09:46:58
Message-ID: 4805CB12.6040402@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wonder if this would help to clean up the equivalence class hacks in
Greg's ordered append patch?

Tom Lane wrote:
> I've been thinking about how to improve the planner's poor handling of
> variables in outer-join situations. Here are some past examples for
> motivation:
>
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg00154.php
> http://archives.postgresql.org/pgsql-general/2008-03/msg01440.php
>
> The reason why the planner acts so stupidly in these examples is that
> we're still using a kluge solution for this old bug:
> http://archives.postgresql.org/pgsql-bugs/2001-04/msg00223.php
>
> The root of the problem is that the planner does not worry about computing
> output expressions until the top of a plan tree. All lower-level join
> nodes are made to output simple lists of Vars referencing columns of the
> base relations of the join. We handle outer-join cases by forcing the
> values of the Vars of the nullable side to null at the level of the join,
> whenever there's no matching row in the nullable side. If one of the base
> relations of the join is a sub-SELECT whose output list includes
> expressions that don't certainly go to null when the input variables are
> forced to null, then we can't flatten that sub-SELECT, because flattening
> the sub-SELECT means that the expression evaluations bubble to the top of
> the plan tree and can produce non-null results when they shouldn't
> (as happened in the above bug, before we realized that we had to prevent
> flattening in this case).
>
> Another problem with this approach is that depending on what level of the
> plan tree you are thinking about, a Var nominally referencing tab.col
> might really mean the value of tab.col, or it might mean "either tab.col
> or NULL depending on what happened at some lower level of outer join".
> Since we can't readily tell the difference, we have estimation errors
> arising from failure to expect some NULLs (there have been recent
> complaints about this), and we need some pretty ugly kluges in places like
> EquivalenceClass processing to handle the risk that apparently identical
> expressions might not really be equal.
>
> I think the basic solution for this is that upper levels of the plan tree
> should refer to the nullable output columns of an outer join using
> "alias Vars" that name the join rel, not the underlying base relation,
> even if there is a simple base-relation Var that the alias is tracking.
> In the case involving a sub-SELECT, the alias Var would stand for whatever
> output expression appears in the sub-SELECT. We already have the concept
> of these alias Vars, in fact --- that's exactly the representation emitted
> by the parser. But historically the planner has smashed aliases down to
> their base Vars as early as possible (see flatten_join_alias_vars).
> That has some advantages but I'm thinking it's outweighed by the
> disadvantages. I'd like to try leaving alias Vars as aliases all the
> way through the planner, in any case where they might be semantically
> different from their referent (ie, whenever there's a possible
> force-to-null involved).
>
> To make this work, we'd need to have the constructed plan tree compute the
> alias Var from its referent expression at the lowest outer-join that can
> null the alias Var. The trick for the executor is to know when to force
> the value to null instead of computing the expression. I first thought
> about marking entries of the join node's targetlist as to be forced to
> null if left or right input row is null. However, that fails if we want
> the join node to compute some projection expressions on top of the raw
> join output (as would certainly happen if it were the top node of the
> tree, for example). That could be handled by inserting another level of
> plan node (ie, a Result) to do the projection, but that seems a pretty
> ugly and inefficient solution. What I have in mind instead is to insert a
> new kind of expression node "ForceToNull" atop the referent expression,
> with this node able to look at the EState (in the same way a regular Var
> node would) to see if it should return a null instead of computing its
> child expression. Then expansion of an alias Var into a ForceToNull and
> the underlying expression would work.
>
> I'm envisioning keeping track of active alias Vars and their expansions in
> a new list attached to the PlannerInfo "root" node. This would provide a
> place to record important information like which level of the join tree
> such a Var needs to be evaluated at.
>
> This is all pretty handwavy yet, but I don't think I'll be able to fill in
> many more details until I try to code it. I thought I'd put up this
> summary to see if anyone can shoot holes in it at this level of detail ...
> Comments?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-04-16 10:07:09 Re: Improve shutdown during online backup
Previous Message Magnus Hagander 2008-04-16 09:25:12 Re: Problem with site doc search