Re: Improve docs for n_distinct_inherited

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve docs for n_distinct_inherited
Date: 2025-10-13 01:47:39
Message-ID: E93C6658-A4E8-420B-8AFD-7E4EFFF81EDB@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

I think your revision is good and accurate.

> On Oct 13, 2025, at 07:42, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> Just picking this one up again. I forgot to come back to this after PGConf.dev.
>
> I came up with:
>
> Ordinarily <literal>n_distinct</literal> is used.
> <literal>n_distinct_inherited</literal> exists to allow the distinct
> estimate to be overwritten for the statistics gathered for inheritance
> parent tables and for partitioned tables.

This clarifies that n_distinct_inherited applies to both inheritance parents and partitioned tabled, that’s accurate and better than the original wording.

>
> I also fixed what I thought was some misleading text about ANALYZE
> using this value to calculate things. That's not true. It's the query
> planner that uses this value. ANALYZE just stores whatever this is set
> to into pg_statistic. I also adjusted the text that was talking about
> "the size of the table", which, as I mentioned earlier isn't correct.
> It's all related to the estimated number rows in the table, per
> "ntuples = vardata->rel->tuples;" in get_variable_numdistinct().
>

When I read the diff, I thought “table size” is usually used in PG docs, so maybe “estimated table size”. But as you explicitly explained why you made the change, I am fine with your change.

> Also fixed a typo; "twice on the average" shouldn't contain "the".
>

Correct grammer fix.

> I wonder if ", since the multiplication by the number of rows in the
> table is not performed until query planning time" should be deleted
> since I modified the text earlier to talk about "the query planner”.
>

Yeah, with your rewrite, that clause now feels a little redundant. I think it can be removed entirely.

The other thing that doesn’t belong to your change but as you are touching here:

“When set to a negative value, which must be greater than or equal to -1"

When I first time read the doc, I was confused. Because no easier sentence indicated “n_distinct” is of float type. I thought “greater than” was a typo. When I read through, the later example (0.5) resolved my confusion. To avoid the same confusion to other readers, maybe change to “when set to a negative value between -1 and 0 (inclusive of -1)” or “when set to a negative value, which must be in the range -1<= value < 0”.

But as I said, this doesn’t belong to your change. If you don’t want to enhance this place, or you don’t consider this is a problem, I am absolutely fine.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-10-13 01:54:11 Re: Failure building libpq v18.0 on old aarch64
Previous Message jian he 2025-10-13 01:47:17 Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it