Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-01-05 00:45:05
Message-ID: 329203a4-7cad-a0e3-d5a8-ce70872c8156@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/4/21 4:34 PM, Dean Rasheed wrote:
>
> ...
>
> Some other comments:
>
> * I'm not sure I understand the need for 0001. Wasn't there an earlier
> version of this patch that just did it by re-populating the type
> array, but which still had it as an array rather than turning it into
> a list? Making it a list falsifies some of the comments and
> function/variable name choices in that file.
>

That's a bit done to Justin - I think it's fine to use the older version
repopulating the type array, but that question is somewhat unrelated to
this patch.

> * There's a comment typo in catalog/Makefile -- "are are reputedly
> other...", should be "there are reputedly other...".
>
> * Looking at the pg_stats_ext view, I think perhaps expressions stats
> should be omitted entirely from that view, since it doesn't show any
> useful information about them. So it could remove "e" from the "kinds"
> array, and exclude rows whose only kind is "e", since such rows have
> no interesting data in them. Essentially, the new view
> pg_stats_ext_exprs makes having any expression stats in pg_stats_ext
> redundant. Hiding this data in pg_stats_ext would also be consistent
> with making the "expressions" stats kind hidden from the user.
>

Hmmm, not sure. I'm not sure removing 'e' from the array is a good idea.
On the one hand it's internal detail, on the other hand most of that
view is internal detail too. Excluding rows with only 'e' seems
reasonable, though. I need to think about this.

> * In gram.y, it wasn't quite obvious why you converted the column list
> for CREATE STATISTICS from an expr_list to a stats_params list. I
> figured it out, and it makes sense, but I think it could use a
> comment, perhaps something along the lines of the one for index_elem,
> e.g.:
>
> /*
> * Statistics attributes can be either simple column references, or arbitrary
> * expressions in parens. For compatibility with index attributes permitted
> * in CREATE INDEX, we allow an expression that's just a function call to be
> * written without parens.
> */
>

OH, right. I'd have trouble figuring this myself, and I wrote that code
myself only one or two months ago.

> * In parse_func.c and parse_agg.c, there are a few new error strings
> that use the abbreviation "stats expressions", whereas most other
> errors refer to "statistics expressions". For consistency, I think
> they should all be the latter.
>

OK, will fix.

> * In generateClonedExtStatsStmt(), I think the "expressions" stats
> kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE
> ...) fails if the source table has expression stats.
>

Yeah, will fix. I guess this also means we're missing some tests.

> * CreateStatistics() uses ShareUpdateExclusiveLock, but in
> tcop/utility.c the relation is opened with a ShareLock. ISTM that the
> latter lock mode should be made to match CreateStatistics().
>

Not sure, will check.

> * Why does the new code in tcop/utility.c not use
> RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation?
> That seems preferable to doing the ACL check in CreateStatistics().
> For one thing, as it stands, it allows the lock to be taken even if
> the user doesn't own the table. Is it intentional that the current
> code allows extended stats to be created on system catalogs? That
> would be one thing that using RangeVarCallbackOwnsRelation would
> change, but I can't see a reason to allow it.
>

I think I copied the code from somewhere - probably expression indexes,
or something like that. Not a proof that it's the right/better way to do
this, though.

> * In src/bin/psql/describe.c, I think the \d output should also
> exclude the "expressions" stats kind and just list the other kinds (or
> have no kinds list at all, if there are no other kinds), to make it
> consistent with the CREATE STATISTICS syntax.
>

Not sure I understand. Why would this make it consistent with CREATE
STATISTICS? Can you elaborate?

> * The pg_dump output for a stats object whose only kind is
> "expressions" is broken -- it includes a spurious "()" for the kinds
> list.
>

Will fix. Again, this suggests there are TAP tests missing.

> That's it for now. I'll look at the optimiser changes next, and try to
> post more comments later this week.
>

Thanks!

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2021-01-05 00:54:29 fix typo in ReorderBufferProcessTXN
Previous Message Dagfinn Ilmari Mannsåker 2021-01-05 00:44:13 Re: language cleanups in code and docs