Re: pg_stats and range statistics

From: Egor Rogov <e(dot)rogov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stats and range statistics
Date: 2023-01-24 07:35:59
Message-ID: 718126ae-c6f5-39ea-0c56-6089a2e9e54d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.01.2023 13:01, Egor Rogov wrote:

> On 23.01.2023 02:21, Tomas Vondra wrote:
>> On 1/22/23 22:33, Justin Pryzby wrote:
>>> On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote:
>>>> On 1/21/23 19:53, Egor Rogov wrote:
>>>>> Hi Tomas,
>>>>> On 21.01.2023 00:50, Tomas Vondra wrote:
>>>>>> This simply adds two functions, accepting/producing anyarray -
>>>>>> one for
>>>>>> lower bounds, one for upper bounds. I don't think it can be done
>>>>>> with a
>>>>>> plain subquery (or at least I don't know how).
>>>>> Anyarray is an alien to SQL, so functions are well justified here.
>>>>> What
>>>>> makes me a bit uneasy is two almost identical functions. Should we
>>>>> consider other options like a function with an additional
>>>>> parameter or a
>>>>> function returning an array of bounds arrays (which is somewhat
>>>>> wasteful, but probably it doesn't matter much here)?
>>>>>
>>>> I thought about that, but I think the alternatives (e.g. a single
>>>> function with a parameter determining which boundary to return). But I
>>>> don't think it's better.
>>> What about a common function, maybe called like:
>>>
>>> ranges_upper_bounds(PG_FUNCTION_ARGS)
>>> {
>>>      AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
>>>      Oid         element_type = AARR_ELEMTYPE(array);
>>>      TypeCacheEntry *typentry;
>>>
>>>      /* Get information about range type; note column might be a
>>> domain */
>>>      typentry = range_get_typcache(fcinfo, getBaseType(element_type));
>>>
>>>      return ranges_bounds_common(typentry, array, false);
>>> }
>>>
>>> That saves 40 LOC.
>>>
>> Thanks, that's better. But I'm still not sure it's a good idea to add
>> function with anyarray argument, when we need it to be an array of
>> ranges ...
>>
>> I wonder if we have other functions doing something similar, i.e.
>> accepting a polymorphic type and then imposing additional restrictions
>> on it.
>
>
> I couldn't find such examples, but adding an adhoc polymorphic type
> just doesn't look right for me. Besides, you'll end up adding not just
> anyrangearray type, but also anymultirangearray,
> anycompatiblerangearray, anycompatiblemultirangearray, and maybe their
> "non"-counterparts like anynonrangearray, and all of these are not of
> much use. And one day you may need an array of arrays or something...
>
> I wonder if it's possible to teach SQL to work with anyarray type - at
> runtime the actual type of anyarray elements is known, right? In fact,
> unnest() alone is enough to eliminate the need of C functions altogether.

When started to look at how we deal with anyarray columns, I came across
the following comment in parse_coerce.c for
enforce_generic_type_consistency():

* A special case is that we could see ANYARRAY as an actual_arg_type even
 * when allow_poly is false (this is possible only because pg_statistic has
 * columns shown as anyarray in the catalogs).

It makes me realize how anyarray as-a-real-type is specific to
pg_statistic. Even if it's possible to somehow postpone type inference
for this case from parse time to execute time, it clearly doesn't worth
the effort.

So, I am for the simplest possible approach, that is, the two proposed
functions ranges_upper(anyarray) and ranges_lower(anyarray). I am not
even sure if it's worth documenting them, as they are very
pg_statistic-specific and likely won't be useful for end users.

>
>
>>> Shouldn't this add some sql tests ?
>>>
>> Yeah, I guess we should have a couple tests calling these functions on
>> different range arrays.
>>
>> This reminds me lower()/upper() have some extra rules about handling
>> empty ranges / infinite boundaries etc. These functions should behave
>> consistently (as if we called lower() in a loop) and I'm pretty sure
>> that's not the current state.
>
>
> I can try to tidy things up, but first we need to decide on the
> general approach.
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-01-24 08:22:03 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Peter Smith 2023-01-24 07:18:35 Re: Perform streaming logical transactions by background workers and parallel apply