pgsql: Prevent functional dependency estimates from exceeding column es

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Prevent functional dependency estimates from exceeding column es
Date: 2020-03-28 12:51:33
Message-ID: E1jIAw1-0007tW-J0@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Prevent functional dependency estimates from exceeding column estimates.

Formerly we applied a functional dependency "a => b with dependency
degree f" using the formula

P(a,b) = P(a) * [f + (1-f)*P(b)]

This leads to the possibility that the combined selectivity P(a,b)
could exceed P(b), which is not ideal. The addition of support for IN
and OR clauses (commits 8f321bd16c and ccaa3569f5) would seem to make
this more likely, since the user-supplied values in such clauses are
not necessarily compatible with the functional dependency.

Mitigate this by using the formula

P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

instead, which guarantees that the combined selectivity is less than
each column's individual selectivity. Logically, this is modifies the
part of the formula that accounts for dependent rows to handle cases
where P(a) > P(b), whilst not changing the second term which accounts
for independent rows.

Additionally, this refactors the way that functional dependencies are
applied, so now dependencies_clauselist_selectivity() estimates both
the implying clauses and the implied clauses for each functional
dependency (formerly only the implied clauses were estimated), and now
all clauses for each attribute are taken into account (formerly only
one clause for each implied attribute was estimated). This removes the
previously built-in assumption that only equality clauses will be
seen, which is no longer true, and opens up the possibility of
applying functional dependencies to more general clauses.

Patch by me, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/87779aa47463d0fb3b4743a7d5c9534994bf7c98

Modified Files
--------------
src/backend/statistics/dependencies.c | 349 ++++++++++++++++++++++----------
src/test/regress/expected/stats_ext.out | 18 +-
src/test/regress/sql/stats_ext.sql | 8 +-
3 files changed, 266 insertions(+), 109 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2020-03-28 14:06:59 pgsql: Enable Unix-domain sockets support on Windows
Previous Message Peter Eisentraut 2020-03-28 07:48:30 pgsql: Cleanup in SQL features files