commit 37d5ef90d226b0b170a755e221794acb4ff2771b
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Aug 1 14:26:48 2022 -0400

    Add overview documentation.

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 41c120e0cd..2b30d22aed 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -295,6 +295,191 @@ Therefore, we don't merge FROM-lists if the result would have too many
 FROM-items in one list.
 
 
+Vars and PlaceHolderVars
+------------------------
+
+A Var node is simply the parse-tree representation of a table column
+reference.  However, in the presence of outer joins, that concept is
+more subtle than it might seem.  We need to distinguish the values of
+a Var "above" and "below" any outer join that could force the Var to
+null.  As an example, consider
+
+	SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)
+
+(Assume foo() is not strict, so that we can't reduce the left join to
+a plain join.)  A naive implementation might try to push the foo(t2.z)
+call down to the scan of t2, but that is not correct because
+(a) what foo() should actually see for a null-extended join row is NULL,
+and (b) if foo() returns false, we should suppress the t1 row from the
+join altogether, not emit it with a null-extended t2 row.  On the other
+hand, it *would* be correct (and desirable) to push the call down to
+the scan level if the query were
+
+	SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y AND foo(t2.z))
+
+This motivates considering "t2.z" within the left join's ON clause
+to be a different value from "t2.z" outside the JOIN clause.  The
+former can be identified with t2.z as seen at the relation scan level,
+but the latter can't.
+
+Another example occurs in connection with EquivalenceClasses (discussed
+below).  Given
+
+	SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE t1.x = 42
+
+we would like to put t1.x and t2.y and 42 into the same EquivalenceClass
+and then derive "t2.y = 42" to use as a restriction clause for the scan
+of t2.  However, it'd be wrong to conclude that t2.y will always have
+the value 42, or that it's equal to t1.x in every joined row.  We can
+solve this problem by deeming that "t2.y" in the ON clause refers to
+the relation-scan-level value of t2.y, but not to the value that y will
+have in joined rows, where it might be NULL rather than equal to t1.x.
+
+Therefore, Var nodes are decorated with "varnullingrels", which are sets
+of the rangetable indexes of outer joins that potentially null this Var
+at the point where it appears in the query.  (Using a set, not an
+ordered list, is fine since it doesn't matter which join forced the
+value to null; and that avoids having to change the representation when
+we consider different outer-join orders.)  In the examples above, all
+occurrences of t1.x would have empty varnullingrels, since the left join
+doesn't null t1.  The t2 references within the JOIN ON clauses would
+also have empty varnullingrels, but other references to t2 columns would
+be labeled with the index of the JOIN's rangetable entry (RTE), so that
+they'd be understood as potentially different from the t2 values seen at
+scan level.  Labeling t2.z in the WHERE clause with the JOIN's RT index
+lets us recognize that that occurrence of foo(t2.z) cannot be pushed
+down to the t2 scan level: we cannot evaluate that value at the scan
+level, but only after the join has been done.
+
+For LEFT and RIGHT outer joins, only Vars coming from the nullable side
+of the join are marked with that join's RT index.  For FULL joins, all
+Vars are marked.  (Such marking doesn't let us tell which side of the
+full join a Var came from; but that information can be found elsewhere
+at need.)
+
+Notionally, a Var having nonempty varnullingrels can be thought of as
+	CASE WHEN any-of-these-outer-joins-produced-a-null-extended-row
+	  THEN NULL
+	  ELSE the-scan-level-value-of-the-column
+	  END
+It's only notional, because no such calculation is ever done explicitly.
+In a finished plan, Vars occurring in scan-level plan nodes represent
+the actual table column values, but upper-level Vars are always
+references to outputs of lower-level plan nodes.  When a join node emits
+a null-extended row, it just returns nulls for the relevant output
+columns rather than copying up values from its input.  Because we don't
+ever have to do this calculation explicitly, it's not necessary to
+distinguish which side of an outer join got null-extended, which'd
+otherwise be essential information for FULL JOIN cases.
+
+Outer join identity 3 (discussed above) complicates this picture
+a bit.  In the form
+	A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+all of the Vars in clauses Pbc and Pab will have empty varnullingrels,
+but if we start with
+	(A leftjoin B on (Pab)) leftjoin C on (Pbc)
+then the parser will have marked Pbc's B Vars with the A/B join's
+RT index, making this form artificially different from the first.
+We resolve this by, after noting that Pbc is strict, running
+through that clause and removing any varnullingrels references to
+left joins in the lefthand side.  That makes the clause equivalent
+to what it would have looked like if the first form were presented,
+so that we can freely consider both join orders.  However, because
+we have done this, if we do construct a plan based on the second
+join order then we cannot cross-check that B Vars appearing above
+the A/B join are all marked with that join's RT index.  That would
+be a useful cross-check to have to catch planner bugs, but it
+doesn't seem useful enough to justify the extra complication of
+devising a representation that would support it.
+
+Outer joins also complicate handling of subquery pull-up.  Consider
+
+	SELECT ..., ss.x FROM tab1
+	  LEFT JOIN (SELECT *, 42 AS x FROM tab2) ss ON ...
+
+We want to be able to pull up the subquery as discussed previously,
+but we can't just replace the "ss.x" Var in the top-level SELECT list
+with the constant 42.  That'd result in always emitting 42, rather
+than emitting NULL in null-extended join rows.
+
+To solve this, we introduce the concept of PlaceHolderVars.
+A PlaceHolderVar is somewhat like a Var, in that its value originates
+at a relation scan level and can then be forced to null by higher-level
+outer joins; hence PlaceHolderVars carry a set of nulling rel IDs just
+like Vars.  Unlike a Var, whose original value comes from a table,
+a PlaceHolderVar's original value is defined by a query-determined
+expression ("42" in this example); so we represent the PlaceHolderVar
+as a node with that expression as child.  We insert a PlaceHolderVar
+whenever subquery pullup needs to replace a subquery-referencing Var
+that has nonempty varnullingrels with an expression that is not simply a
+Var.  (When the replacement expression is a pulled-up Var, we can just
+add the replaced Var's varnullingrels to its set.  Also, if the replaced
+Var has empty varnullingrels, we don't need a PlaceHolderVar: there is
+nothing that'd force the value to null, so the pulled-up expression is
+fine to use as-is.)  In a finished plan, a PlaceHolderVar becomes just
+the contained expression at whatever plan level it's supposed to be
+evaluated at, and then upper-level occurrences are replaced by
+references to that output column of the lower plan level.  That causes
+the value to go to null when appropriate at an outer join, in the same
+way as for Vars.  Thus, PlaceHolderVars are never seen outside the
+planner.
+
+PlaceHolderVars (PHVs) are more complicated than Vars in another way:
+their original value might need to be calculated at a join, not a
+base-level relation scan.  This can happen if a pulled-up subquery
+contains a join.  Because of this, a PHV can create a join order
+constraint that wouldn't otherwise exist, to ensure that it can
+be calculated before it is used.  A PHV's expression can also contain
+LATERAL references, adding complications that are discussed below.
+
+
+Relation Identification and Qual Clause Placement
+-------------------------------------------------
+
+A qual clause obtained from WHERE or JOIN/ON can be enforced at the lowest
+scan or join level that includes all relations used in the clause.  For
+this purpose we consider that outer joins listed in varnullingrels or
+phnullingrels are used in the clause, since we can't compute the qual's
+result correctly until we know whether such Vars have gone to null.
+
+The one exception to this general rule is that a non-degenerate outer
+JOIN/ON qual (one that references the non-nullable side of the join)
+cannot be enforced below that join, even if it doesn't reference the
+nullable side.  Pushing it down into the non-nullable side would result
+in rows disappearing from the join's result, rather than appearing as
+null-extended rows.  To handle that, when we identify such a qual we
+artificially add the join's minimum input relid set to the set of
+relations it is considered to use, forcing it to be evaluated exactly at
+that join level.  The same happens for outer-join quals that mention no
+relations at all.
+
+When attaching a qual clause to a join plan node that is performing
+an outer join, the qual clause is considered a "join clause" (that
+is, it is applied before the join) if it does not use that specific
+outer join, or a "filter clause" (applied after the join) if it does
+use that outer join.
+
+These things lead us to identify join relations within the planner
+by the sets of base relation RT indexes plus outer join RT indexes
+that they include.  In that way, the sets of relations used by qual
+clauses can be directly compared to join relations' relid sets to
+see where to place the clauses.  These identifying sets are unique
+because, for any given collection of base relations, there is only
+one valid set of outer joins to have performed along the way to
+joining that set of base relations (although the order of applying
+them could vary, as discussed above).
+
+SEMI joins do not have RT indexes, because they are artifacts made by
+the planner rather than the parser.  (We could create rangetable
+entries for them, but there seems no need at present.)  This does not
+cause a problem for qual placement, because the nullable side of a
+semijoin is not referenceable from above the join, so there is never a
+need to cite it in varnullingrels or phnullingrels.  It does not cause
+a problem for join relation identification either, since again whether
+a semijoin has been completed is implicit in the set of base relations
+included in the join.
+
+
 Optimizer Functions
 -------------------
 
@@ -437,11 +622,10 @@ inputs.
 EquivalenceClasses
 ------------------
 
-During the deconstruct_jointree() scan of the query's qual clauses, we look
-for mergejoinable equality clauses A = B whose applicability is not delayed
-by an outer join; these are called "equivalence clauses".  When we find
-one, we create an EquivalenceClass containing the expressions A and B to
-record this knowledge.  If we later find another equivalence clause B = C,
+During the deconstruct_jointree() scan of the query's qual clauses, we
+look for mergejoinable equality clauses A = B.  When we find one, we
+create an EquivalenceClass containing the expressions A and B to record
+that they are equal.  If we later find another equivalence clause B = C,
 we add C to the existing EquivalenceClass for {A B}; this may require
 merging two existing EquivalenceClasses.  At the end of the scan, we have
 sets of values that are known all transitively equal to each other.  We can
@@ -473,15 +657,26 @@ asserts that at any plan node where more than one of its member values
 can be computed, output rows in which the values are not all equal may
 be discarded without affecting the query result.  (We require all levels
 of the plan to enforce EquivalenceClasses, hence a join need not recheck
-equality of values that were computable by one of its children.)  For an
-ordinary EquivalenceClass that is "valid everywhere", we can further infer
-that the values are all non-null, because all mergejoinable operators are
-strict.  However, we also allow equivalence clauses that appear below the
-nullable side of an outer join to form EquivalenceClasses; for these
-classes, the interpretation is that either all the values are equal, or
-all (except pseudo-constants) have gone to null.  (This requires a
-limitation that non-constant members be strict, else they might not go
-to null when the other members do.)  Consider for example
+equality of values that were computable by one of its children.)
+
+We can further infer that the values are all non-null, because all
+mergejoinable operators are strict.  This is a little tricky in the
+presence of outer joins.  Consider
+
+	SELECT *
+	  FROM a LEFT JOIN
+	       (SELECT * FROM b LEFT JOIN c ON b.y = c.z WHERE b.y = 10) ss
+	       ON a.x = ss.y
+	  WHERE a.x = 42;
+
+We can form the EquivalenceClass {b.y c.z 10} and thereby apply c.z = 10
+while scanning c.  However it would be incorrect to conclude that a.x
+is also a member of that EquivalenceClass.  Instead, we form a second
+EquivalenceClass {a.x ss.y 42}, where (as discussed earlier) ss.y
+references the same table column as b.y but has a different
+varnullingrels label and is therefore considered a distinct Var.
+
+If the lower join were INNER:
 
 	SELECT *
 	  FROM a LEFT JOIN
@@ -489,40 +684,23 @@ to null when the other members do.)  Consider for example
 	       ON a.x = ss.y
 	  WHERE a.x = 42;
 
-We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
-apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
-clauses from forming EquivalenceClasses is exactly that we want to be able
-to push any derived clauses as far down as possible.)  But once above the
-outer join it's no longer necessarily the case that b.y = 10, and thus we
-cannot use such EquivalenceClasses to conclude that sorting is unnecessary
-(see discussion of PathKeys below).
-
-In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
-equivalence clause because its applicability to b is delayed by the outer
-join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
-But since we see that a.x has been equated to 42 above the outer join, we
-are able to form a below-outer-join class {b.y 42}; this restriction can be
-added because no b/c row not having b.y = 42 can contribute to the result
-of the outer join, and so we need not compute such rows.  Now this class
-will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
-which lets the planner deduce that the b/c join need not be computed at all
-because none of its rows can contribute to the outer join.  (This gets
-implemented as a gating Result filter, since more usually the potential
-contradiction involves Param values rather than just Consts, and thus has
-to be checked at runtime.)
+then ss.y is not any different from b.y and we'd end up with the
+EquivalenceClass {a.x b.y c.z 10 42}.  This leads to the contradiction
+10 = 42, which lets the planner deduce that the b/c join need not be
+computed at all because none of its rows can contribute to the outer
+join.  (This gets implemented as a gating Result filter, since more
+usually the potential contradiction involves Param values rather than
+just Consts, and thus has to be checked at runtime.)
 
 To aid in determining the sort ordering(s) that can work with a mergejoin,
 we mark each mergejoinable clause with the EquivalenceClasses of its left
-and right inputs.  For an equivalence clause, these are of course the same
-EquivalenceClass.  For a non-equivalence mergejoinable clause (such as an
-outer-join qualification), we generate two separate EquivalenceClasses for
-the left and right inputs.  This may result in creating single-item
-equivalence "classes", though of course these are still subject to merging
-if other equivalence clauses are later found to bear on the same
-expressions.
-
-Another way that we may form a single-item EquivalenceClass is in creation
-of a PathKey to represent a desired sort order (see below).  This is a bit
+and right inputs.  (These are in fact always the same EquivalenceClass.)
+
+In some cases we will form single-item EquivalenceClasses.  This happens
+if an ORDER BY or GROUP BY key is not mentioned in any equivalence
+clause.  We need to reason about sort orders in such queries, and our
+representation of sort ordering is a PathKey (see below) which uses an
+EquivalenceClass, so we have to make an EquivalenceClass.  This is a bit
 different from the above cases because such an EquivalenceClass might
 contain an aggregate function or volatile expression.  (A clause containing
 a volatile function will never be considered mergejoinable, even if its top
@@ -579,7 +757,7 @@ Index scans have Path.pathkeys that represent the chosen index's ordering,
 if any.  A single-key index would create a single-PathKey list, while a
 multi-column index generates a list with one element per key index column.
 Non-key columns specified in the INCLUDE clause of covering indexes don't
-have corresponding PathKeys in the list, because the have no influence on
+have corresponding PathKeys in the list, because they have no influence on
 index ordering.  (Actually, since an index can be scanned either forward or
 backward, there are two possible sort orders and two possible PathKey lists
 it can generate.)
@@ -655,14 +833,9 @@ redundancy, we save time and improve planning, since the planner will more
 easily recognize equivalent orderings as being equivalent.
 
 Another interesting property is that if the underlying EquivalenceClass
-contains a constant and is not below an outer join, then the pathkey is
-completely redundant and need not be sorted by at all!  Every row must
-contain the same constant value, so there's no need to sort.  (If the EC is
-below an outer join, we still have to sort, since some of the rows might
-have gone to null and others not.  In this case we must be careful to pick
-a non-const member to sort by.  The assumption that all the non-const
-members go to null at the same plan level is critical here, else they might
-not produce the same sort order.)  This might seem pointless because users
+contains a constant, then the pathkey is completely redundant and need
+not be sorted by at all!  Every row must contain the same value, so
+there's no need to sort.  This might seem pointless because users
 are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
 recognize when particular index columns are irrelevant to the sort order:
 if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
@@ -670,15 +843,6 @@ produces correctly ordered data without a sort step.  We used to have very
 ugly ad-hoc code to recognize that in limited contexts, but discarding
 constant ECs from pathkeys makes it happen cleanly and automatically.
 
-You might object that a below-outer-join EquivalenceClass doesn't always
-represent the same values at every level of the join tree, and so using
-it to uniquely identify a sort order is dubious.  This is true, but we
-can avoid dealing with the fact explicitly because we always consider that
-an outer join destroys any ordering of its nullable inputs.  Thus, even
-if a path was sorted by {a.x} below an outer join, we'll re-sort if that
-sort ordering was important; and so using the same PathKey for both sort
-orderings doesn't create any real problem.
-
 
 Order of processing for EquivalenceClasses and PathKeys
 -------------------------------------------------------
