Re: pg_stats and range statistics

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Egor Rogov <e(dot)rogov(at)postgrespro(dot)ru>, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stats and range statistics
Date: 2023-01-22 18:19:41
Message-ID: d6280c94-febf-c17d-f18b-773260089ebf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/21/23 19:53, Egor Rogov wrote:
> Hi Tomas,
>
> On 21.01.2023 00:50, Tomas Vondra wrote:
>> Hi Egor,
>>
>> While reviewing a patch improving join estimates for ranges [1] I
>> realized we don't show stats for ranges in pg_stats, and I recalled we
>> had this patch.
>>
>> I rebased the v2, and I decided to took a stab at showing separate
>> histograms for lower/upper histogram bounds. I believe it makes it way
>> more readable, which is what pg_stats is about IMHO.
>
>
> Thanks for looking into this.
>
> I have to admit it looks much better this way, so +1.
>

OK, good to hear.

>
>> This simply adds two functions, accepting/producing anyarray - one for
>> lower bounds, one for upper bounds. I don't think it can be done with a
>> plain subquery (or at least I don't know how).
>
>
> Anyarray is an alien to SQL, so functions are well justified here. What
> makes me a bit uneasy is two almost identical functions. Should we
> consider other options like a function with an additional parameter or a
> function returning an array of bounds arrays (which is somewhat
> wasteful, but probably it doesn't matter much here)?
>

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

Moreover, I think this is pretty similar to lower/upper, which already
work on range values. So if we have separate functions for that, we
should do the same thing here.

I renamed the functions to ranges_lower/ranges_upper, but maybe why not
to even call the functions lower/upper too?

The main trouble with the function I can think of is that we only have
anyarray type, not anyrangearray. So the functions will get called for
arbitrary array, and the check that it's array of ranges happens inside.
I'm not sure if that's a good or bad idea, or what would it take to add
a new polymorphic type ...

For now I at least kept "ranges_" to make it less likely.

>
>> Finally, it renames the empty_range_frac to start with range_, per the
>> earlier discussion. I wonder if the new column names for lower/upper
>> bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
>> too long ...
>
>
> It seems so. The ending -s should be left out since it's a single
> histogram now. And I think that
> range_lower_histogram/range_upper_histogram are descriptive enough.
>
> I'm adding one more patch to shorten the column names, refresh the docs,
> and make 'make check' happy (unfortunately, we have to edit
> src/regress/expected/rules.out every time pg_stats definition changes).
>

Thanks. I noticed the docs were added to pg_user_mapping by mistake, not
to pg_stats. So I fixed that, and I also added the new functions.

Finally, I reordered the fields a bit - moved range_empty_frac to keep
the histogram fields together.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Display-length-and-bounds-histograms-in-pg_-20230122.patch text/x-patch 11.9 KB
0002-tweaks-20230122.patch text/x-patch 13.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-01-22 18:47:07 Re: Schema variables - new implementation for Postgres 15 (typo)
Previous Message Jelte Fennema 2023-01-22 18:14:24 Re: run pgindent on a regular basis / scripted manner