From d9a633243aa7a2c3dc4e6b4b9c75a93823730d0d Mon Sep 17 00:00:00 2001
From: "David G. Johnston"
Date: Thu, 9 Jun 2022 15:14:20 +0000
Subject: [PATCH] doc: Make selectivity example match text; rewrite proof in
code
---
doc/src/sgml/planstats.sgml | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index df85ea5eea..bdd2f4afd0 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals
In this case there is no MCV information for
- unique2 because all the values appear to be
- unique, so we use an algorithm that relies only on the number of
- distinct values for both relations together with their null fractions:
+ unique2 and all the values appear to be
+ unique (n_distinct = -1), so we use an algorithm that relies on the row
+ count estimates for both relations (num_rows, not shown, but "tenk")
+ together with the column null fractions (zero for both):
-selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)
+selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2)
= (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
This is, subtract the null fraction from one for each of the relations,
- and divide by the maximum of the numbers of distinct values.
+ and divide by the row count of the larger relation (this value does get
+ scaled in the non-unique case).
The number of rows
that the join is likely to emit is calculated as the cardinality of the
Cartesian product of the two inputs, multiplied by the
--
2.25.1