commit 60c3d589ccc990a387e91f63f17eb78fbc3d9c3f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Oct 30 15:39:58 2022 -0400

    Add overview documentation.

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 41c120e0cd..360d37bcaa 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -295,6 +295,239 @@ Therefore, we don't merge FROM-lists if the result would have too many
 FROM-items in one list.
 
 
+Vars and PlaceHolderVars
+------------------------
+
+A Var node is simply the parse-tree representation of a table column
+reference.  However, in the presence of outer joins, that concept is
+more subtle than it might seem.  We need to distinguish the values of
+a Var "above" and "below" any outer join that could force the Var to
+null.  As an example, consider
+
+	SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)
+
+(Assume foo() is not strict, so that we can't reduce the left join to
+a plain join.)  A naive implementation might try to push the foo(t2.z)
+call down to the scan of t2, but that is not correct because
+(a) what foo() should actually see for a null-extended join row is NULL,
+and (b) if foo() returns false, we should suppress the t1 row from the
+join altogether, not emit it with a null-extended t2 row.  On the other
+hand, it *would* be correct (and desirable) to push the call down to
+the scan level if the query were
+
+	SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y AND foo(t2.z))
+
+This motivates considering "t2.z" within the left join's ON clause
+to be a different value from "t2.z" outside the JOIN clause.  The
+former can be identified with t2.z as seen at the relation scan level,
+but the latter can't.
+
+Another example occurs in connection with EquivalenceClasses (discussed
+below).  Given
+
+	SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE t1.x = 42
+
+we would like to use the EquivalenceClass mechanisms to derive "t2.y = 42"
+to use as a restriction clause for the scan of t2.  (That works, because t2
+rows having y different from 42 cannot affect the query result.)  However,
+it'd be wrong to conclude that t2.y will be equal to t1.x in every joined
+row.  Part of the solution to this problem is to deem that "t2.y" in the
+ON clause refers to the relation-scan-level value of t2.y, but not to the
+value that y will have in joined rows, where it might be NULL rather than
+equal to t1.x.
+
+Therefore, Var nodes are decorated with "varnullingrels", which are sets
+of the rangetable indexes of outer joins that potentially null the Var
+at the point where it appears in the query.  (Using a set, not an ordered
+list, is fine since it doesn't matter which join forced the value to null;
+and that avoids having to change the representation when we consider
+different outer-join orders.)  In the examples above, all occurrences of
+t1.x would have empty varnullingrels, since the left join doesn't null t1.
+The t2 references within the JOIN ON clauses would also have empty
+varnullingrels.  But outside the JOIN clauses, any Vars referencing t2
+would have varnullingrels containing the index of the JOIN's rangetable
+entry (RTE), so that they'd be understood as potentially different from
+the t2 values seen at scan level.  Labeling t2.z in the WHERE clause with
+the JOIN's RT index lets us recognize that that occurrence of foo(t2.z)
+cannot be pushed down to the t2 scan level: we cannot evaluate that value
+at the scan level, but only after the join has been done.
+
+For LEFT and RIGHT outer joins, only Vars coming from the nullable side
+of the join are marked with that join's RT index.  For FULL joins, Vars
+from both inputs are marked.  (Such marking doesn't let us tell which
+side of the full join a Var came from; but that information can be found
+elsewhere at need.)
+
+Notionally, a Var having nonempty varnullingrels can be thought of as
+	CASE WHEN any-of-these-outer-joins-produced-a-null-extended-row
+	  THEN NULL
+	  ELSE the-scan-level-value-of-the-column
+	  END
+It's only notional, because no such calculation is ever done explicitly.
+In a finished plan, Vars occurring in scan-level plan nodes represent
+the actual table column values, but upper-level Vars are always
+references to outputs of lower-level plan nodes.  When a join node emits
+a null-extended row, it just returns nulls for the relevant output
+columns rather than copying up values from its input.  Because we don't
+ever have to do this calculation explicitly, it's not necessary to
+distinguish which side of an outer join got null-extended, which'd
+otherwise be essential information for FULL JOIN cases.
+
+Outer join identity 3 (discussed above) complicates this picture
+a bit.  In the form
+	A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+all of the Vars in clauses Pbc and Pab will have empty varnullingrels,
+but if we start with
+	(A leftjoin B on (Pab)) leftjoin C on (Pbc)
+then the parser will have marked Pbc's B Vars with the A/B join's
+RT index, making this form artificially different from the first.
+For discussion's sake, let's denote this marking with a star:
+	(A leftjoin B on (Pab)) leftjoin C on (Pb*c)
+To cope with this, once we have detected that commuting these joins
+is legal, we generate both the Pbc and Pb*c forms of that ON clause,
+by either removing or adding the first join's RT index in the B Vars
+that the parser created.  While generating paths for a plan step that
+joins B and C, we include as a relevant join qual only the form that
+is appropriate depending on whether A has already been joined to B.
+
+It's also worth noting that identity 3 makes "the left join's RT index"
+itself a bit of a fuzzy concept, since the syntactic scope of each join
+RTE will depend on which form was produced by the parser.  We resolve
+this by considering that a left join's identity is determined by its
+minimum set of right-hand-side input relations.  In both forms allowed
+by identity 3, we can identify the first join as having minimum RHS B
+and the second join as having minimum RHS C.
+
+Another thing to notice is that C Vars appearing outside the nested
+JOIN clauses will be marked as nulled by both left joins if the
+original parser input was in the first form of identity 3, but if the
+parser input was in the second form, such Vars will only be marked as
+nulled by the second join.  This is not really a semantic problem:
+such Vars will be marked the same way throughout the upper part of the
+query, so they will all look equal() which is correct; and they will not
+look equal() to any C Var appearing in the JOIN ON clause or below these
+joins.  However, when building Vars representing the outputs of join
+relations, we need to ensure that their varnullingrels are set to
+values consistent with the syntactic join order, so that they will
+appear equal() to pre-existing Vars in the upper part of the query.
+
+Outer joins also complicate handling of subquery pull-up.  Consider
+
+	SELECT ..., ss.x FROM tab1
+	  LEFT JOIN (SELECT *, 42 AS x FROM tab2) ss ON ...
+
+We want to be able to pull up the subquery as discussed previously,
+but we can't just replace the "ss.x" Var in the top-level SELECT list
+with the constant 42.  That'd result in always emitting 42, rather
+than emitting NULL in null-extended join rows.
+
+To solve this, we introduce the concept of PlaceHolderVars.
+A PlaceHolderVar is somewhat like a Var, in that its value originates
+at a relation scan level and can then be forced to null by higher-level
+outer joins; hence PlaceHolderVars carry a set of nulling rel IDs just
+like Vars.  Unlike a Var, whose original value comes from a table,
+a PlaceHolderVar's original value is defined by a query-determined
+expression ("42" in this example); so we represent the PlaceHolderVar
+as a node with that expression as child.  We insert a PlaceHolderVar
+whenever subquery pullup needs to replace a subquery-referencing Var
+that has nonempty varnullingrels with an expression that is not simply a
+Var.  (When the replacement expression is a pulled-up Var, we can just
+add the replaced Var's varnullingrels to its set.  Also, if the replaced
+Var has empty varnullingrels, we don't need a PlaceHolderVar: there is
+nothing that'd force the value to null, so the pulled-up expression is
+fine to use as-is.)  In a finished plan, a PlaceHolderVar becomes just
+the contained expression at whatever plan level it's supposed to be
+evaluated at, and then upper-level occurrences are replaced by
+references to that output column of the lower plan level.  That causes
+the value to go to null when appropriate at an outer join, in the same
+way as for Vars.  Thus, PlaceHolderVars are never seen outside the
+planner.
+
+PlaceHolderVars (PHVs) are more complicated than Vars in another way:
+their original value might need to be calculated at a join, not a
+base-level relation scan.  This can happen if a pulled-up subquery
+contains a join.  Because of this, a PHV can create a join order
+constraint that wouldn't otherwise exist, to ensure that it can
+be calculated before it is used.  A PHV's expression can also contain
+LATERAL references, adding complications that are discussed below.
+
+
+Relation Identification and Qual Clause Placement
+-------------------------------------------------
+
+A qual clause obtained from WHERE or JOIN/ON can be enforced at the lowest
+scan or join level that includes all relations used in the clause.  For
+this purpose we consider that outer joins listed in varnullingrels or
+phnullingrels are used in the clause, since we can't compute the qual's
+result correctly until we know whether such Vars have gone to null.
+
+The one exception to this general rule is that a non-degenerate outer
+JOIN/ON qual (one that references the non-nullable side of the join)
+cannot be enforced below that join, even if it doesn't reference the
+nullable side.  Pushing it down into the non-nullable side would result
+in rows disappearing from the join's result, rather than appearing as
+null-extended rows.  To handle that, when we identify such a qual we
+artificially add the join's minimum input relid set to the set of
+relations it is considered to use, forcing it to be evaluated exactly at
+that join level.  The same happens for outer-join quals that mention no
+relations at all.
+
+When attaching a qual clause to a join plan node that is performing an
+outer join, the qual clause is considered a "join clause" (that is, it is
+applied before the join performs null-extension) if it does not reference
+that outer join in any varnullingrels or phnullingrels set, or a "filter
+clause" (applied after null-extension) if it does reference that outer
+join.  A qual clause that originally appeared in that outer join's JOIN/ON
+will fall into the first category, since the parser would not have marked
+any of its Vars as referencing the outer join.  A qual clause that
+originally came from some upper ON clause or WHERE clause will be seen as
+referencing the outer join if it references any of the nullable side's
+Vars, since those Vars will be so marked by the parser.  But, if such a
+qual does not reference any nullable-side Vars, it's okay to push it down
+into the non-nullable side, so it won't get attached to the join node in
+the first place.
+
+These things lead us to identify join relations within the planner
+by the sets of base relation RT indexes plus outer join RT indexes
+that they include.  In that way, the sets of relations used by qual
+clauses can be directly compared to join relations' relid sets to
+see where to place the clauses.  These identifying sets are unique
+because, for any given collection of base relations, there is only
+one valid set of outer joins to have performed along the way to
+joining that set of base relations (although the order of applying
+them could vary, as discussed above).
+
+SEMI joins do not have RT indexes, because they are artifacts made by
+the planner rather than the parser.  (We could create rangetable
+entries for them, but there seems no need at present.)  This does not
+cause a problem for qual placement, because the nullable side of a
+semijoin is not referenceable from above the join, so there is never a
+need to cite it in varnullingrels or phnullingrels.  It does not cause a
+problem for join relation identification either, since whether a semijoin
+has been completed is again implicit in the set of base relations
+included in the join.
+
+There is one additional complication for qual clause placement, which
+occurs when we have made multiple versions of an outer-join clause as
+described previously (that is, we have both "Pbc" and "Pb*c" forms of
+the same clause seen in outer join identity 3).  When forming an outer
+join we only want to apply one of the redundant versions of the clause.
+If we are forming the B/C join without having yet computed the A/B
+join, it's easy to reject the "Pb*c" form since its required relid
+set includes the A/B join relid which is not in the input.  However,
+if we form B/C after A/B, then both forms of the clause are applicable
+so far as that test can tell.  We have to look more closely to notice
+that the "Pbc" clause form refers to relation B which is no longer
+directly accessible.  While this check is straightforward, it's not
+especially cheap (see clause_is_computable_at()).  To avoid doing it
+unnecessarily, we mark the variant versions of a redundant clause as
+either "has_clone" or "is_clone".  A production build of Postgres
+checks restriction_is_computable_at() to disentangle which clone copy
+to apply at a given join level.  In debug builds, we also Assert that
+non-clone clauses are validly computable, but that seems too expensive
+for production usage.
+
+
 Optimizer Functions
 -------------------
 
@@ -437,11 +670,10 @@ inputs.
 EquivalenceClasses
 ------------------
 
-During the deconstruct_jointree() scan of the query's qual clauses, we look
-for mergejoinable equality clauses A = B whose applicability is not delayed
-by an outer join; these are called "equivalence clauses".  When we find
-one, we create an EquivalenceClass containing the expressions A and B to
-record this knowledge.  If we later find another equivalence clause B = C,
+During the deconstruct_jointree() scan of the query's qual clauses, we
+look for mergejoinable equality clauses A = B.  When we find one, we
+create an EquivalenceClass containing the expressions A and B to record
+that they are equal.  If we later find another equivalence clause B = C,
 we add C to the existing EquivalenceClass for {A B}; this may require
 merging two existing EquivalenceClasses.  At the end of the scan, we have
 sets of values that are known all transitively equal to each other.  We can
@@ -473,15 +705,54 @@ asserts that at any plan node where more than one of its member values
 can be computed, output rows in which the values are not all equal may
 be discarded without affecting the query result.  (We require all levels
 of the plan to enforce EquivalenceClasses, hence a join need not recheck
-equality of values that were computable by one of its children.)  For an
-ordinary EquivalenceClass that is "valid everywhere", we can further infer
-that the values are all non-null, because all mergejoinable operators are
-strict.  However, we also allow equivalence clauses that appear below the
-nullable side of an outer join to form EquivalenceClasses; for these
-classes, the interpretation is that either all the values are equal, or
-all (except pseudo-constants) have gone to null.  (This requires a
-limitation that non-constant members be strict, else they might not go
-to null when the other members do.)  Consider for example
+equality of values that were computable by one of its children.)
+
+It's tempting to include equality clauses appearing in outer-join
+conditions as sources of EquivalenceClasses, but there's a serious
+difficulty: the resulting deductions are not valid everywhere.
+For example, given
+
+	SELECT * FROM a LEFT JOIN b ON a.x = b.y WHERE a.x = 42;
+
+we could safely derive b.y = 42 and use that in the scan of B,
+because B rows not having b.y = 42 will not contribute to the
+join result.  Likewise, given
+
+	SELECT * FROM a LEFT JOIN b ON a.x = b.y AND a.x = b.z;
+
+it's all right to apply b.y = b.z while scanning B, and then only
+one of the two equality conditions need be tested at the join.
+However, if we have
+
+	SELECT * FROM a LEFT JOIN b ON a.x1 = b.y AND a.x2 = b.y;
+
+it'd be completely incorrect to push "a.x1 = a.x2" down to the scan
+of A.  Rows where they are different should not be eliminated from
+the join result, but instead produce null-extended join rows.
+
+In general, therefore, we can treat outer-join equalities somewhat like
+real equivalences, but we can only produce derived clauses at that
+outer join and at scans and joins contained within its nullable side.
+(FULL JOIN conditions can't be optimized at all this way, since derived
+clauses couldn't be enforced on either side.)
+
+Another instructive example is:
+
+	SELECT *
+	  FROM a LEFT JOIN
+	       (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
+	       ON a.x = ss.y
+	  ORDER BY ss.y;
+
+We can form the EquivalenceClass {b.y c.z 10} and thereby apply c.z = 10
+while scanning c.  However, this does not tell us anything about the
+ss.y reference appearing in ORDER BY (which is another name for b.y*,
+that is the possibly-nulled form of b.y), so we don't get to conclude
+that sorting for the ORDER BY is unnecessary, as it would be if we could
+prove that b.y* is equal to a constant (see discussion of PathKeys
+below).
+
+Also consider this variant:
 
 	SELECT *
 	  FROM a LEFT JOIN
@@ -489,40 +760,60 @@ to null when the other members do.)  Consider for example
 	       ON a.x = ss.y
 	  WHERE a.x = 42;
 
-We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
-apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
-clauses from forming EquivalenceClasses is exactly that we want to be able
-to push any derived clauses as far down as possible.)  But once above the
-outer join it's no longer necessarily the case that b.y = 10, and thus we
-cannot use such EquivalenceClasses to conclude that sorting is unnecessary
-(see discussion of PathKeys below).
-
-In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
-equivalence clause because its applicability to b is delayed by the outer
-join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
-But since we see that a.x has been equated to 42 above the outer join, we
-are able to form a below-outer-join class {b.y 42}; this restriction can be
-added because no b/c row not having b.y = 42 can contribute to the result
-of the outer join, and so we need not compute such rows.  Now this class
-will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
+Here, we have an EquivalenceClass {a.x 42} in addition to {b.y c.z 10},
+and we have an outer-join condition a.x = b.y (not b.y*).  That lets us
+derive b.y = 42, but we can only constrain scans/joins below the left join
+that way.  Nonetheless, we can still produce the contradiction 10 = 42,
 which lets the planner deduce that the b/c join need not be computed at all
 because none of its rows can contribute to the outer join.  (This gets
 implemented as a gating Result filter, since more usually the potential
 contradiction involves Param values rather than just Consts, and thus has
 to be checked at runtime.)
 
+To handle outer-join conditions this way, we put their left and right
+operands into EquivalenceClasses in the usual way.  (This may result in
+creating single-item equivalence "classes", though of course these are
+still subject to merging if other equivalence clauses are found that
+mention the same Vars.)  We do not merge those two EquivalenceClasses
+as would happen with an ordinary equivalence condition.  Instead, the
+outer-join condition is recorded in a separate "ConstrainedEquivalence"
+data structure, showing the EquivalenceClasses it connects and the scope
+of the outer join that it is valid within.  We can make deductions as
+if the two classes were one, but only when considering a scan or join
+within the scope of the constrained equivalence.
+
 To aid in determining the sort ordering(s) that can work with a mergejoin,
 we mark each mergejoinable clause with the EquivalenceClasses of its left
-and right inputs.  For an equivalence clause, these are of course the same
-EquivalenceClass.  For a non-equivalence mergejoinable clause (such as an
-outer-join qualification), we generate two separate EquivalenceClasses for
-the left and right inputs.  This may result in creating single-item
-equivalence "classes", though of course these are still subject to merging
-if other equivalence clauses are later found to bear on the same
-expressions.
+and right inputs.  For an ordinary equivalence clause these will be the
+same EquivalenceClass, since processing of the clause itself causes its
+inputs to be put into the same EquivalenceClass.  But as described above,
+mergejoinable outer-join clauses will end up with different
+EquivalenceClasses for left and right sides.
+
+There is an additional complication when re-ordering outer joins according
+to identity 3.  Recall that the two choices we consider for such joins are
+	A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+	(A leftjoin B on (Pab)) leftjoin C on (Pb*c)
+where the star denotes varnullingrels markers on B's Vars.  When Pbc
+is (or includes) a mergejoinable clause, we have something like
+	A leftjoin (B leftjoin C on (b.b = c.c)) on (Pab)
+	(A leftjoin B on (Pab)) leftjoin C on (b.b* = c.c)
+We could generate a ConstrainedEquivalence linking b.b and c.c, and
+another one linking b.b* and c.c.  (b.b and b.b* are necessarily in
+different EquivalenceClasses: there is no mechanism whereby they
+could be found to be equal.)  However, these would generate largely
+duplicative conditions.  Conditions involving b.b* can't be computed
+below this join nest, and any that can be computed would be duplicative
+of what we'd get from the b.b/c.c ConstrainedEquivalence.  Therefore,
+we choose to generate a ConstrainedEquivalence for b.b and c.c, but
+"b.b* = c.c" is handled as just an ordinary clause.
 
 Another way that we may form a single-item EquivalenceClass is in creation
-of a PathKey to represent a desired sort order (see below).  This is a bit
+of a PathKey to represent a desired sort order (see below).  This happens
+if an ORDER BY or GROUP BY key is not mentioned in any equivalence
+clause.  We need to reason about sort orders in such queries, and our
+representation of sort ordering is a PathKey (see below) which uses an
+EquivalenceClass, so we have to make an EquivalenceClass.  This is a bit
 different from the above cases because such an EquivalenceClass might
 contain an aggregate function or volatile expression.  (A clause containing
 a volatile function will never be considered mergejoinable, even if its top
@@ -579,7 +870,7 @@ Index scans have Path.pathkeys that represent the chosen index's ordering,
 if any.  A single-key index would create a single-PathKey list, while a
 multi-column index generates a list with one element per key index column.
 Non-key columns specified in the INCLUDE clause of covering indexes don't
-have corresponding PathKeys in the list, because the have no influence on
+have corresponding PathKeys in the list, because they have no influence on
 index ordering.  (Actually, since an index can be scanned either forward or
 backward, there are two possible sort orders and two possible PathKey lists
 it can generate.)
@@ -655,14 +946,9 @@ redundancy, we save time and improve planning, since the planner will more
 easily recognize equivalent orderings as being equivalent.
 
 Another interesting property is that if the underlying EquivalenceClass
-contains a constant and is not below an outer join, then the pathkey is
-completely redundant and need not be sorted by at all!  Every row must
-contain the same constant value, so there's no need to sort.  (If the EC is
-below an outer join, we still have to sort, since some of the rows might
-have gone to null and others not.  In this case we must be careful to pick
-a non-const member to sort by.  The assumption that all the non-const
-members go to null at the same plan level is critical here, else they might
-not produce the same sort order.)  This might seem pointless because users
+contains a constant, then the pathkey is completely redundant and need not
+be sorted by at all!  Every interesting row must contain the same value,
+so there's no need to sort.  This might seem pointless because users
 are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
 recognize when particular index columns are irrelevant to the sort order:
 if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
@@ -670,15 +956,6 @@ produces correctly ordered data without a sort step.  We used to have very
 ugly ad-hoc code to recognize that in limited contexts, but discarding
 constant ECs from pathkeys makes it happen cleanly and automatically.
 
-You might object that a below-outer-join EquivalenceClass doesn't always
-represent the same values at every level of the join tree, and so using
-it to uniquely identify a sort order is dubious.  This is true, but we
-can avoid dealing with the fact explicitly because we always consider that
-an outer join destroys any ordering of its nullable inputs.  Thus, even
-if a path was sorted by {a.x} below an outer join, we'll re-sort if that
-sort ordering was important; and so using the same PathKey for both sort
-orderings doesn't create any real problem.
-
 
 Order of processing for EquivalenceClasses and PathKeys
 -------------------------------------------------------