Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Zhenghua Lyu <zlyu(at)vmware(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?
Date: 2020-10-31 01:04:35
Message-ID: 149287.1604106275@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * It's not apparent why, if ANALYZE's sample is all nulls, we wouldn't
> conclude stadistinct = 0 and thus arrive at the desired answer that
> way. (Since we have a complaint, I'm guessing that ANALYZE might
> disbelieve its own result and stick in some larger stadistinct. But
> then maybe that's where to fix this, not here.)

Oh, on second thought (and with some testing): ANALYZE *does* report
stadistinct = 0. The real issue is that get_variable_numdistinct is
assuming it can use that value as meaning "stadistinct is unknown".
So maybe we should just fix that, probably by adding an explicit
bool flag for that condition.

BTW ... I've not looked at the callers, but now I'm wondering whether
get_variable_numdistinct ought to count NULL as one of the "distinct"
values. In applications such as estimating the number of GROUP BY
groups, it seems like that would be correct. There might be some
callers that don't want it though.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-10-31 01:20:53 Re: [PATCH] Add extra statistics to explain for Nested Loop
Previous Message Michael Paquier 2020-10-31 01:03:49 Re: Consistent error reporting for encryption/decryption in pgcrypto