diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
new file mode 100644
index dadca67..382cbd7
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7377,6 +7377,15 @@ SCRAM-SHA-256$<iteration
m for most common values (MCV) list statistics
+
+
+
+ stxexprspg_node_tree
+
+
+ A list of any expressions covered by this statistics object.
+
+
@@ -7474,6 +7483,16 @@ SCRAM-SHA-256$<iteration
pg_mcv_list type
+
+
+
+ stxdexprpg_statistic[]
+
+
+ Per-expression statistics, serialized as an array of
+ pg_statistic type
+
+
@@ -12843,7 +12862,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
The view pg_stats_ext provides access to
- the information stored in the pg_statistic_ext
and pg_statistic_ext_data
catalogs. This view allows access only to rows of
@@ -12930,7 +12950,16 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
(references pg_attribute.attname)
- Names of the columns the extended statistics is defined on
+ Names of the columns included in the extended statistics
+
+
+
+
+
+ exprstext[]
+
+
+ Expressions included in the extended statistics
@@ -13033,7 +13062,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
The view pg_stats_ext_exprs provides access to
- the information stored in the pg_statistic_ext
and pg_statistic_ext_data
catalogs. This view allows access only to rows of
@@ -13119,7 +13149,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
exprtext
- Expression the extended statistics is defined on
+ Expression included in the extended statistics
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 5f3aefd..f561599
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -27,7 +27,7 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
[ ( statistics_kind [, ... ] ) ]
- ON { column_name | ( expression ) } [, ...]
+ ON { column_name | ( expression ) }, { column_name | ( expression ) } [, ...]
FROM table_name
@@ -45,12 +45,15 @@ CREATE STATISTICS [ IF NOT EXISTS ]
The CREATE STATISTICS command has two basic forms. The
- simple variant allows building statistics for a single expression, does
- not allow specifying any statistics kinds and provides benefits similar
- to an expression index. The full variant allows defining statistics objects
- on multiple columns and expressions, and selecting which statistics kinds will
- be built. The per-expression statistics are built automatically when there
- is at least one expression.
+ first form allows univariate statistics for a single expression to be
+ collected, providing benefits similar to an expression index without the
+ overhead of index maintenance. This form does not allow the statistics
+ kind to be specified, since the various statistics kinds refer only to
+ multivariate statistics. The second form of the command allows
+ multivariate statistics on multiple columns and/or expressions to be
+ collected, optionally specifying which statistics kinds to include. This
+ form will also automatically cause univariate statistics to be collected on
+ any expressions included in the list.
@@ -93,16 +96,16 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_kind
- A statistics kind to be computed in this statistics object.
+ A multivariate statistics kind to be computed in this statistics object.
Currently supported kinds are
ndistinct, which enables n-distinct statistics,
dependencies, which enables functional
dependency statistics, and mcv which enables
most-common values lists.
If this clause is omitted, all supported statistics kinds are
- included in the statistics object. Expression statistics are built
- automatically when the statistics definition includes complex
- expressions and not just simple column references.
+ included in the statistics object. Univariate expression statistics are
+ built automatically if the statistics definition includes any complex
+ expressions rather than just simple column references.
For more information, see
and .
@@ -114,8 +117,9 @@ CREATE STATISTICS [ IF NOT EXISTS ]
The name of a table column to be covered by the computed statistics.
- At least two column names must be given; the order of the column names
- is insignificant.
+ This is only allowed when building multivariate statistics. At least
+ two column names or expressions must be specified, and their order is
+ not significant.
@@ -124,9 +128,11 @@ CREATE STATISTICS [ IF NOT EXISTS ] expression
- The expression to be covered by the computed statistics. In this case
- only a single expression is required, in which case only statistics
- for the expression are built.
+ An expression to be covered by the computed statistics. This may be
+ used to build univariate statistics on a single expression, or as part
+ of a list of multiple column names and/or expressions to build
+ multivariate statistics. In the latter case, separate univariate
+ statistics are built automatically for each expression in the list.
@@ -156,8 +162,8 @@ CREATE STATISTICS [ IF NOT EXISTS ]
Expression statistics are per-expression and are similar to creating an
index on the expression, except that they avoid the overhead of index
- maintenance. Expression statistics are built automatically when there
- is at least one expression in the statistics object definition.
+ maintenance. Expression statistics are built automatically for each
+ expression in the statistics object definition.
@@ -232,13 +238,12 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (
Create table t3 with a single timestamp column,
- and run a query using an expression on that column. Without extended
- statistics, the planner has no information about data distribution for
- results of those expression, and uses default estimates as illustrated
- by the first query. The planner also does not realize that the value of
- the second column fully determines the value of the other column, because
- date truncated to day still identifies the month. Then expression and
- ndistinct statistics are built on those two columns:
+ and run queries using expressions on that column. Without extended
+ statistics, the planner has no information about the data distribution for
+ the expressions, and uses default estimates. The planner also does not
+ realize that the value of the date truncated to the month is fully
+ determined by the value of the date truncated to the day. Then expression
+ and ndistinct statistics are built on those two expressions:
CREATE TABLE t3 (
@@ -262,7 +267,8 @@ EXPLAIN ANALYZE SELECT * FROM t3
EXPLAIN ANALYZE SELECT date_trunc('month', a), date_trunc('day', a)
FROM t3 GROUP BY 1, 2;
--- per-expression statistics are built automatically
+-- build ndistinct statistics on the pair of expressions (per-expression
+-- statistics are built automatically)
CREATE STATISTICS s3 (ndistinct) ON date_trunc('month', a), date_trunc('day', a) FROM t3;
ANALYZE t3;