From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Improve docs for n_distinct_inherited |
Date: | 2025-10-13 01:27:48 |
Message-ID: | CAKFQuwZ60bW_1qN=H0h4r_qN2eNNekKtEqxssRV=NqLNV_PYUg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Oct 12, 2025 at 7:42 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Just picking this one up again. I forgot to come back to this after
> PGConf.dev.
>
> On Fri, 9 May 2025 at 02:50, David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> > I was missing this key piece of knowledge which invalidated my entire
> attempt.
> >
> > Here's an attempt at shortening this now that I understand the mechanics
> better.
> >
> > Separate options exist because an inheritance parent table has two
> > different sets of statistics: one considering only itself and one
> which
> > also includes its children
> (<literal>n_distinct_inherited</literal>).
> > Partitioned tables, which only have rows in the children, likewise
> uses
> > the inherited option while everyone else uses
> <literal>n_distinct</literal>.
>
> I wasn't quite happy with that as the text indicates that
> n_distinct_inherited is the statistics. But, it's not, it's just the
> option that allows some modification of the gathered statistics.
>
> I came up with:
>
> Ordinarily <literal>n_distinct</literal> is used.
> <literal>n_distinct_inherited</literal> exists to allow the distinct
> estimate to be overwritten for the statistics gathered for
> inheritance
> parent tables and for partitioned tables.
How about:
"n_disinct is used for normal tables while n_distinct_inherited is used for
partitioned tables. Both are usable (selected via the ONLY modifier) for
an inheritance parent table since it has both storage and children."
The use of both "Ordinarily" and "overwritten" is bothering me here. And
it implies that n_distinct doesn't work for inheritance parent tables or,
conversely, that n_distinct does work for partitioned tables.
> I also fixed what I thought was some misleading text about ANALYZE
> using this value to calculate things.
> values in the column is linear with the estimated number
> of rows in the table; the exact count is to be computed by
multiplying the estimated
> rows in the table by the absolute value of the given number.
"...is proportional to the estimated number of rows in the table at
planning time."
(The final "at planning time" substitutes for the sentence you pondered
removing.)
(The rest, including the examples, seem a bit self-explanatory given the
definition, though I do get reader inexperience with the terminology. But
proportional implies a linear relationship, and the positive/negative
bifurcation seems straight-forward here.)
I'm thinking everything else below is better incorporated into 14.2 which
should be linking back to this section. That way the crux of the usage is
defined in syntax while the details about setting a specific value are
located in the section covering the overall topic.
the exact count is to be computed by multiplying the estimated
+ rows in the table by the absolute value of the given number. For
example,
a value of -1 implies that all values in the column are distinct,
while
- a value of -0.5 implies that each value appears twice on the average.
+ a value of -0.5 implies that each value appears twice on average.
This can be useful when the size of the table changes over time,
since
the multiplication by the number of rows in the table is not
performed
until query planning time.
(Leave: Specify a value of 0 to revert to estimating...)
That said, this rework would be OK as-is.
Also, looking at stadistinct, the multiplier stored there accounts for the
presence of null. This attribute-option does not. Is that difference
worth noting?
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2025-10-13 01:47:17 | Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it |
Previous Message | Peter Smith | 2025-10-13 01:20:29 | Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE |