From: | Jim Finnerty <jfinnert(at)amazon(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Use zero for nullness estimates of system attributes |
Date: | 2019-01-26 15:26:25 |
Message-ID: | 1548516385152-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
It's related, but what I was referring to applies even to the uncorrelated
case: suppose you have something like:
select x, sum(z)
from t
where
x > 5 and y in ('a', 'b', 'c')
group by x;
let's say that 'a', 'b', and 'c' are not frequent values of y, so the
estimated selectivity is based on the n_distinct of y and the 3 values. Now
imagine that x > 5 is applied first. That reduces the number of qualifying
rows by the selectivity of (x > 5), but it may also reduce the number of
distinct values of y. If it reduces the n_distinct of y, then the IN
predicate selectivity should be adjusted also.
The negative n_distinct representation (for n_distinct a large fraction of
numrows) already accounts for an automatic scaling of n_distinct when the
table size grows or shrinks. Adjusting the n_distinct of y after applying
predicate (x > 5) is conceptually the same thing, except that the scaling is
not linear, but would be based on a (fairly cheap) probabilistic formula of
how many distinct y's can be expected to survive after applying predicate x,
rather than assuming that all of them do.
If you have correlation information for columns (x, y) you'd want to use
that information instead, but in the absence of that information you could
make this adjustment.
It might make a good intern project if somebody wants to investigate it.
-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-01-26 15:41:59 | Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance |
Previous Message | Tom Lane | 2019-01-26 14:57:10 | Re: Thread-unsafe coding in ecpg |