Re: Improve docs for n_distinct_inherited

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve docs for n_distinct_inherited
Date: 2025-10-12 23:42:37
Message-ID: CAApHDvp2gSNOtzKQg8jH=j8A6jMFrE=xPr-+3z_yPd8ykL2rXQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just picking this one up again. I forgot to come back to this after PGConf.dev.

On Fri, 9 May 2025 at 02:50, David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> I was missing this key piece of knowledge which invalidated my entire attempt.
>
> Here's an attempt at shortening this now that I understand the mechanics better.
>
> Separate options exist because an inheritance parent table has two
> different sets of statistics: one considering only itself and one which
> also includes its children (<literal>n_distinct_inherited</literal>).
> Partitioned tables, which only have rows in the children, likewise uses
> the inherited option while everyone else uses <literal>n_distinct</literal>.

I wasn't quite happy with that as the text indicates that
n_distinct_inherited is the statistics. But, it's not, it's just the
option that allows some modification of the gathered statistics.

I came up with:

Ordinarily <literal>n_distinct</literal> is used.
<literal>n_distinct_inherited</literal> exists to allow the distinct
estimate to be overwritten for the statistics gathered for inheritance
parent tables and for partitioned tables.

I also fixed what I thought was some misleading text about ANALYZE
using this value to calculate things. That's not true. It's the query
planner that uses this value. ANALYZE just stores whatever this is set
to into pg_statistic. I also adjusted the text that was talking about
"the size of the table", which, as I mentioned earlier isn't correct.
It's all related to the estimated number rows in the table, per
"ntuples = vardata->rel->tuples;" in get_variable_numdistinct().

Also fixed a typo; "twice on the average" shouldn't contain "the".

I wonder if ", since the multiplication by the number of rows in the
table is not performed until query planning time" should be deleted
since I modified the text earlier to talk about "the query planner".

David

Attachment Content-Type Size
doc_n_distinct_inherited_v3.patch application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Makhmutov 2025-10-12 23:58:38 Re: Adding basic NUMA awareness
Previous Message Thomas Munro 2025-10-12 22:44:57 Re: IO in wrong state on riscv64