Re: Use zero for nullness estimates of system attributes

From: Jim Finnerty <jfinnert(at)amazon(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use zero for nullness estimates of system attributes
Date: 2019-01-26 15:26:25
Message-ID: 1548516385152-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It's related, but what I was referring to applies even to the uncorrelated
case: suppose you have something like:

select x, sum(z)
from t
where
x > 5 and y in ('a', 'b', 'c')
group by x;

let's say that 'a', 'b', and 'c' are not frequent values of y, so the
estimated selectivity is based on the n_distinct of y and the 3 values. Now
imagine that x > 5 is applied first. That reduces the number of qualifying
rows by the selectivity of (x > 5), but it may also reduce the number of
distinct values of y. If it reduces the n_distinct of y, then the IN
predicate selectivity should be adjusted also.

The negative n_distinct representation (for n_distinct a large fraction of
numrows) already accounts for an automatic scaling of n_distinct when the
table size grows or shrinks. Adjusting the n_distinct of y after applying
predicate (x > 5) is conceptually the same thing, except that the scaling is
not linear, but would be based on a (fairly cheap) probabilistic formula of
how many distinct y's can be expected to survive after applying predicate x,
rather than assuming that all of them do.

If you have correlation information for columns (x, y) you'd want to use
that information instead, but in the absence of that information you could
make this adjustment.

It might make a good intern project if somebody wants to investigate it.

-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-26 15:41:59 Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance
Previous Message Tom Lane 2019-01-26 14:57:10 Re: Thread-unsafe coding in ecpg