Re: pg_stats and range statistics

From: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Egor Rogov <e(dot)rogov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stats and range statistics
Date: 2021-07-11 18:54:23
Message-ID: CAE-ML+_69xE=hhw6ZU-BT5h9mR8u8p9BGrikToKPTGthK_VmZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.

Attached is v2 of this patch with some cosmetic changes. Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

One question:

We do have the option of representing the histogram of lower bounds separately
from the histogram of upper bounds, as two separate view columns. Don't know if
there is much utility though and there is a fair bit of added complexity: see
below. Thoughts?

My attempts via SQL (unnest -> lower|upper -> array_agg) were futile given
unnest does not play nice with anyarray. For instance:

select unnest(stavalues1) from pg_statistic;
ERROR: cannot determine element type of "anyarray" argument

Maybe the only option is to write a UDF pg_get_{lower|upper}_bounds_histogram
which can do something similar to what calc_hist_selectivity does:

/*
* Convert histogram of ranges into histograms of its lower and upper
* bounds.
*/
nhist = hslot.nvalues;
hist_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
hist_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
for (i = 0; i < nhist; i++)
{
bool empty;

range_deserialize(rng_typcache, DatumGetRangeTypeP(hslot.values[i]),
&hist_lower[i], &hist_upper[i], &empty);
/* The histogram should not contain any empty ranges */
if (empty)
elog(ERROR, "bounds histogram contains an empty range");
}

This is looking good and ready.

[1] https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f
[2] https://www.postgresql.org/docs/devel/view-pg-stats.html

Regards,
Soumyadeep (VMware)

Attachment Content-Type Size
v2-pgstats.patch text/x-patch 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2021-07-11 19:39:24 Re: row filtering for logical replication
Previous Message Tom Lane 2021-07-11 15:28:58 Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped