Re: Bogus ANALYZE results for an otherwise-unique column with many nulls

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andreas Joseph Krogh <andreas(at)visena(dot)com>
Subject: Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Date: 2016-08-07 07:01:40
Message-ID: CAEZATCVQ9AGw1thJiViYWHXXZ46_p6FfDPBeyTC9BSNDz+6L6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 August 2016 at 21:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> OK, thanks. What shall we do about Andreas' request to back-patch this?
> I'm personally willing to do it, but there is the old bugaboo of "maybe
> it will destabilize a plan that someone is happy with".
>

My inclination would be to back-patch it because arguably it's a
bug-fix -- at the very least the old behaviour didn't match the docs
for stadistinct:

The number of distinct nonnull data values in the column.
A value greater than zero is the actual number of distinct values.
A value less than zero is the negative of a multiplier for the number
of rows in the table; for example, a column in which values appear about
twice on the average could be represented by
<structfield>stadistinct</> = -0.5.

Additionally, I think that example is misleading because it's only
really true if there are no null values in the column. Perhaps it
would help to have a more explicit example to illustrate how nulls
affect stadistinct, for example:

... for example, a column in which about 80% of the values are nonnull
and each nonnull value appears about twice on average could be
represented by <structfield>stadistinct</> = -0.4.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Joseph Krogh 2016-08-07 08:16:45 Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Previous Message Thomas Munro 2016-08-07 04:45:39 Consolidate 'unique array values' logic into a reusable function?