Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2020-11-24 17:29:05
Message-ID: 28bfe53b-0e75-79c1-3403-ebf10714a94e@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/24/20 5:23 PM, Justin Pryzby wrote:
> On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
>> 0004 - Seems fine. IMHO not really "silly errors" but OK.
>
> This is one of the same issues you pointed out - shadowing a variable.
> Could be backpatched.
>
> On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
>>> + errmsg("statistics expressions and predicates can refer only to the table being indexed")));
>>> + * partial-index predicates. Create it in the per-index context to be
>>>
>>> I think these are copied and shouldn't mention "indexes" or "predicates". Or
>>> should statistics support predicates, too ?
>>>
>>
>> Right. Stupid copy-pasto.
>
> Right, but then I was wondering if CREATE STATS should actually support
> predicates, since one use case is to do what indexes do without their overhead.
> I haven't thought about it enough yet.
>

Well, it's not supported now, so the message is bogus. I'm not against
supporting "partial statistics" with predicates in the future, but it's
going to be non-trivial project on it's own. It's not something I can
bolt onto the current patch easily.

>> 0006 - Not sure. I think CreateStatistics can be fixed with less code,
>> keeping it more like PG13 (good for backpatching). Not sure why rename
>> extended statistics to multi-variate statistics - we use "extended"
>> everywhere.
>
> - if (build_expressions && (list_length(stxexprs) == 0))
> + if (!build_expressions_only && (list_length(stmt->exprs) < 2))
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> - errmsg("extended expression statistics require at least one expression")));
> + errmsg("multi-variate statistics require at least two columns")));
>
> I think all of "CREATE STATISTICS" has been known as "extended stats", so I
> think it may be confusing to say that it requires two columns for the general
> facility.
>
>> Not sure what's the point of serialize_expr_stats changes,
>> that's code is mostly copy-paste from update_attstats.
>
> Right. I think "i" is poor variable name when it isn't a loop variable and not
> of limited scope.
>

OK, I understand. I'll consider tweaking that.

>> 0007 - I suspect this makes the pg_stats_ext too complex to work with,
>> IMHO we should move this to a separate view.
>
> Right - then unnest() the whole thing and return one row per expression rather
> than array, as you've done. Maybe the docs should say that this returns one
> row per expression.
>
> Looking quickly at your new patch: I guess you know there's a bunch of
> lingering references to "indexes" and "predicates":
>
> I don't know if you want to go to the effort to prohibit this.
> postgres=# CREATE STATISTICS asf ON (tableoid::int+1) FROM t;
> CREATE STATISTICS
>

Hmm, we're already rejecting system attributes, I suppose we should do
the same thing for expressions on system attributes.

> I think a lot of people will find this confusing:
>
> postgres=# CREATE STATISTICS asf ON i FROM t;
> ERROR: extended statistics require at least 2 columns
> postgres=# CREATE STATISTICS asf ON (i) FROM t;
> CREATE STATISTICS
>
> postgres=# CREATE STATISTICS asf (expressions) ON i FROM t;
> ERROR: extended expression statistics require at least one expression
> postgres=# CREATE STATISTICS asf (expressions) ON (i) FROM t;
> CREATE STATISTICS
>
> I haven't looked, but is it possible to make it work without parens ?
>

Hmm, you're right that may be surprising. I suppose we could walk the
expressions while creating the statistics, and replace such trivial
expressions with the nested variable, but I haven't tried. I wonder what
the CREATE INDEX behavior would be in these cases.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-24 17:29:41 Re: Strange behavior with polygon and NaN
Previous Message Robert Haas 2020-11-24 17:15:42 Re: [HACKERS] Custom compression methods