Re: PoC/WIP: Extended statistics on expressions

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-01-16 23:22:08
Message-ID: 20210116232208.GB8560@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 16, 2021 at 05:48:43PM +0100, Tomas Vondra wrote:
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>expr</structfield> <type>text</type>
> + </para>
> + <para>
> + Expression the extended statistics is defined on
> + </para></entry>

Expression the extended statistics ARE defined on
Or maybe say "on which the extended statistics are defined"

> + <para>
> + The <command>CREATE STATISTICS</command> command has two basic forms. The
> + simple variant allows to build statistics for a single expression, does

.. ALLOWS BUILDING statistics for a single expression, AND does (or BUT does)

> + Expression statistics are per-expression and are similar to creating an
> + index on the expression, except that they avoid the overhead of the index.

Maybe say "overhead of index maintenance"

> + All functions and operators used in a statistics definition must be
> + <quote>immutable</quote>, that is, their results must depend only on
> + their arguments and never on any outside influence (such as
> + the contents of another table or the current time). This restriction

say "outside factor" or "external factor"

> + results of those expression, and uses default estimates as illustrated
> + by the first query. The planner also does not realize the value of the

realize THAT

> + second column fully defines the value of the other column, because date
> + truncated to day still identifies the month. Then expression and
> + ndistinct statistics are built on those two columns:

I got an error doing this:

CREATE TABLE t AS SELECT generate_series(1,9) AS i;
CREATE STATISTICS s ON (i+1) ,(i+1+0) FROM t;
ANALYZE t;
SELECT i+1 FROM t GROUP BY 1;
ERROR: corrupt MVNDistinct entry

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-01-16 23:32:10 Re: list of extended statistics on psql
Previous Message Jeff Davis 2021-01-16 23:04:16 Re: New Table Access Methods for Multi and Single Inserts