Re: Show a human-readable n_distinct in pg_stats view

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Maxence Ahlouche <maxence(dot)ahlouche(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Show a human-readable n_distinct in pg_stats view
Date: 2019-03-15 10:11:07
Message-ID: 5bc8425af4ec78661477310f674b2185d86f3c9f.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Maxence Ahlouche wrote:
> It seems to me that since the pg_stats view is supposed to be
> human-readable, it would make sense to show a human-readable version
> of n_distinct.
> Currently, when the stats collector estimates that the number of
> distinct values is more than 10% of the total row count, what is
> stored in pg_statistic.stadistinct is -1 * n_distinct / totalrows, the
> rationale being that if new rows are inserted in the table, they are
> likely to introduce new values, and storing that value allows the
> stadistinct not to get stale too fast.
>
> You can find attached a simple WIP patch to show the proper n_distinct
> value in pg_stats.
>
> * Is this desired?
> * Would it make sense to add a column in the pg_stats view to display
> the information "lost", that is the fact that postgres will assume
> that inserting new rows means a higher n_distinct?
> * Am I right to assume that totalrows in the code
> (src/backend/commands/analyze.c:2170) actually corresponds to
> n_live_tup? That's what I gathered from glancing at the code, but I
> might be wrong.
> * Should the catalog version be changed for this kind of change?
> * Should I add this patch to the commitfest?
>
> If this patch is actually desired, I'll update the documentation as well.
> I'm guessing this patch would break scripts relying on the pg_stats
> view, but I do not know how much we want to avoid that, since they
> should rely on the base tables rather than on the views.

This may make things easier for those who are confused by a negative
entry, but it will obfuscate matters for those who are not.

I don't think that is a win, particularly since the semantics are
explained in great detail in the documentation of "pg_stats".

So I am -1 on that one.

Yours,
Laurenz Albe

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2019-03-15 10:41:55 Re: Offline enabling/disabling of data checksums
Previous Message John Naylor 2019-03-15 10:10:16 Re: WIP: Avoid creation of the free space map for small tables