From: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org, Andreas Joseph Krogh <andreas(at)visena(dot)com> |
Subject: | Re: Bogus ANALYZE results for an otherwise-unique column with many nulls |
Date: | 2016-08-05 10:29:42 |
Message-ID: | 87wpjvecwz.fsf@news-spur.riddles.org.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
Tom> What I did in the patch is to scale the formerly fixed "-1.0"
Tom> stadistinct estimate to discount the fraction of nulls we found.
Andrew> This seems quite dubious to me. stadistinct representing only
Andrew> the non-null values seems to me to be substantially more useful
Andrew> and less confusing; it should be up to consumers to take
Andrew> stanullfrac into account (in general they already do) since in
Andrew> many cases we explicitly do _not_ want to count nulls.
Hm. I am wrong about this, since it's the fact that consumers are taking
stanullfrac into account that makes the value wrong in the first place.
For example, if a million-row table has stanullfrac=0.9 and
stadistinct=-1, then get_variable_numdistinct is returning 1 million,
and (for example) var_eq_non_const divides 0.1 by that to give a
selectivity of 1 in 10 million, which is obviously wrong.
But I think the fix is still wrong, because it changes the meaning of
ALTER TABLE ... ALTER col SET (n_distinct=...) in a non-useful way; it
is no longer possible to nail down a useful negative n_distinct value if
the null fraction of the column is variable. Would it not make more
sense to do the adjustment in get_variable_numdistinct, instead?
--
Andrew (irc:RhodiumToad)
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2016-08-05 10:53:18 | Re: Declarative partitioning |
Previous Message | Simon Riggs | 2016-08-05 10:29:01 | Re: PostgreSQL 10 Roadmaps |