Quick Links

Using multiple extended statistics for estimates

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Using multiple extended statistics for estimates
Date:	2019-10-28 15:20:48
Message-ID:	20191028152048.jc6pqv5hb7j77ocp@development
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

PostgreSQL 10 introduced extended statistics, allowing us to consider
correlation between columns to improve estimates, and PostgreSQL 12
added support for MCV statistics. But we still had the limitation that
we only allowed using a single extended statistics per relation, i.e.
given a table with two extended stats

CREATE TABLE t (a int, b int, c int, d int);
CREATE STATISTICS s1 (mcv) ON a, b FROM t;
CREATE STATISTICS s2 (mcv) ON c, d FROM t;

and a query

SELECT * FROM t WHERE a = 1 AND b = 1 AND c = 1 AND d = 1;

we only ever used one of the statistics (and we considered them in a not
particularly well determined order).

This patch addresses this by using as many extended stats as possible,
by adding a loop to statext_mcv_clauselist_selectivity(). In each step
we pick the "best" applicable statistics (in the sense of covering the
most attributes) and factor it into the oveall estimate.

All this happens where we'd originally consider applying a single MCV
list, i.e. before even considering the functional dependencies, so
roughly like this:

while ()
{
... apply another MCV list ...
}

... apply functional dependencies ...

I've both in the loop, but I think that'd be wrong - the MCV list is
expected to contain more information about individual values (compared
to functional deps, which are column-level).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
use-multiple-extended-stats-v1.patch	text/plain	6.3 KB

Responses

Re: Using multiple extended statistics for estimates at 2019-11-06 19:54:40 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2019-10-28 15:28:49	Re: WIP/PoC for parallel backup
Previous Message	Tom Lane	2019-10-28 15:20:44	Re: Proposition to use '==' as synonym for 'IS NOT DISTINCT FROM'