Re: multivariate statistics (v24)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v24)
Date: 2017-03-02 03:05:34
Message-ID: a78ffb17-70e8-a55a-c10c-66ab575e88ed@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:

1) I've mostly abandoned the "multivariate" name in favor of "extended",
particularly in places referring to stats stored in the pg_statistic_ext
in general. "Multivariate" is now used only in places talking about
particular types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics, and
the assumption is that we'll also add other (non-multivariate) types of
statistics - e.g. statistics on custom expressions, or some for of join
statistics.

2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
pg_mv_stats view renamed to pg_stats_ext).

3) The structure of pg_statistic_ext was changed as proposed by Alvaro,
i.e. the boolean flags were removed and instead we have just a single
"char[]" column with list of enabled statistics.

4) I also got rid of the "mv" part in most variable/function/constant
names, replacing it by "ext" or something similar. Also mvstats.h got
renamed to stats.h.

5) Moved the files from src/backend/utils/mvstats to backend/statistics.

6) Fixed the n_choose_k() overflow issues by using the algorithm
proposed by Dean. Also, use the simple formula for num_combinations().

7) I've tweaked data types for a few struct members (in stats.h). I've
kept most of the uint32 fields at the top level though, because int16
might not be large enough for large statistics and the overhead is
minimal (compared to the space needed e.g. for histogram buckets).

The renames/changes were quite widespread, but I've done my best to fix
all the comments and various other places.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-teach-pull_-varno-varattno-_walker-about-Restric-v24.patch.gz application/gzip 730 bytes
0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v24.patch.gz application/gzip 37.6 KB
0003-PATCH-functional-dependencies-only-the-ANALYZE-p-v24.patch.gz application/gzip 16.3 KB
0004-PATCH-selectivity-estimation-using-functional-de-v24.patch.gz application/gzip 14.0 KB
0005-PATCH-multivariate-MCV-lists-v24.patch.gz application/gzip 31.8 KB
0006-PATCH-multivariate-histograms-v24.patch.gz application/gzip 39.1 KB
0007-WIP-use-ndistinct-for-selectivity-estimation-in--v24.patch.gz application/gzip 4.3 KB
0008-WIP-allow-using-multiple-statistics-in-clauselis-v24.patch.gz application/gzip 2.0 KB
0009-WIP-psql-tab-completion-basics-v24.patch.gz application/gzip 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-03-02 03:06:34 Re: Patch to improve performance of replay of AccessExclusiveLock
Previous Message vinayak 2017-03-02 02:56:23 Re: Transactions involving multiple postgres foreign servers