From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Improve docs for n_distinct_inherited |
Date: | 2025-10-13 01:47:39 |
Message-ID: | E93C6658-A4E8-420B-8AFD-7E4EFFF81EDB@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi David,
I think your revision is good and accurate.
> On Oct 13, 2025, at 07:42, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> Just picking this one up again. I forgot to come back to this after PGConf.dev.
>
> I came up with:
>
> Ordinarily <literal>n_distinct</literal> is used.
> <literal>n_distinct_inherited</literal> exists to allow the distinct
> estimate to be overwritten for the statistics gathered for inheritance
> parent tables and for partitioned tables.
This clarifies that n_distinct_inherited applies to both inheritance parents and partitioned tabled, that’s accurate and better than the original wording.
>
> I also fixed what I thought was some misleading text about ANALYZE
> using this value to calculate things. That's not true. It's the query
> planner that uses this value. ANALYZE just stores whatever this is set
> to into pg_statistic. I also adjusted the text that was talking about
> "the size of the table", which, as I mentioned earlier isn't correct.
> It's all related to the estimated number rows in the table, per
> "ntuples = vardata->rel->tuples;" in get_variable_numdistinct().
>
When I read the diff, I thought “table size” is usually used in PG docs, so maybe “estimated table size”. But as you explicitly explained why you made the change, I am fine with your change.
> Also fixed a typo; "twice on the average" shouldn't contain "the".
>
Correct grammer fix.
> I wonder if ", since the multiplication by the number of rows in the
> table is not performed until query planning time" should be deleted
> since I modified the text earlier to talk about "the query planner”.
>
Yeah, with your rewrite, that clause now feels a little redundant. I think it can be removed entirely.
The other thing that doesn’t belong to your change but as you are touching here:
“When set to a negative value, which must be greater than or equal to -1"
When I first time read the doc, I was confused. Because no easier sentence indicated “n_distinct” is of float type. I thought “greater than” was a typo. When I read through, the later example (0.5) resolved my confusion. To avoid the same confusion to other readers, maybe change to “when set to a negative value between -1 and 0 (inclusive of -1)” or “when set to a negative value, which must be in the range -1<= value < 0”.
But as I said, this doesn’t belong to your change. If you don’t want to enhance this place, or you don’t consider this is a problem, I am absolutely fine.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-10-13 01:54:11 | Re: Failure building libpq v18.0 on old aarch64 |
Previous Message | jian he | 2025-10-13 01:47:17 | Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it |