Re: Bogus ANALYZE results for an otherwise-unique column with many nulls

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Andreas Joseph Krogh <andreas(at)visena(dot)com>
Subject: Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Date: 2016-08-05 17:40:53
Message-ID: 87oa57dsnn.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Also, the way that the value is calculated in the
Tom> samples-not-all-distinct case corresponds to the way I have it in
Tom> the patch.

Ahh, gotcha. You're referring to this:

/*
* If we estimated the number of distinct values at more than 10% of
* the total row count (a very arbitrary limit), then assume that
* stadistinct should scale with the row count rather than be a fixed
* value.
*/
if (stats->stadistinct > 0.1 * totalrows)
stats->stadistinct = -(stats->stadistinct / totalrows);

where "totalrows" includes nulls obviously. So this expects negative
stadistinct to be scaled by the total table size, and the all-distinct
case should do the same.

Objection withdrawn.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2016-08-05 17:43:37 Re: Heap WARM Tuples - Design Draft
Previous Message Tom Lane 2016-08-05 17:40:04 Re: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)