From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PoC/WIP: Extended statistics on expressions |
Date: | 2021-01-18 15:48:28 |
Message-ID: | CAEZATCVnQccV_Z7AuvbvquaOYxUj+cVFk48pPGASYHFjXWgW5g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Looking through extended_stats.c, I found a corner case that can lead
to a seg-fault:
CREATE TABLE foo();
CREATE STATISTICS s ON (1) FROM foo;
ANALYSE foo;
This crashes in lookup_var_attr_stats(), because it isn't expecting
nvacatts to be 0. I can't think of any case where building stats on a
table with no analysable columns is useful, so it should probably just
exit early in that case.
In BuildRelationExtStatistics(), it looks like min_attrs should be
declared assert-only.
In evaluate_expressions():
+ /* set the pointers */
+ result = (ExprInfo *) ptr;
+ ptr += sizeof(ExprInfo);
I think that should probably have a MAXALIGN().
A slightly bigger issue that I don't like is the way it assigns
attribute numbers for expressions starting from
MaxHeapAttributeNumber+1, so the first expression has an attnum of
1601. That leads to pretty inefficient use of Bitmapsets, since most
tables only contain a handful of columns, and there's a large block of
unused space in the middle the Bitmapset.
An alternative approach might be to use regular attnums for columns
and use negative indexes -1, -2, -3, ... for expressions in the stored
stats. Then when storing and retrieving attnums from Bitmapsets, it
could just offset by STATS_MAX_DIMENSIONS (8) to avoid negative values
in the Bitmapsets, since there can't be more than that many
expressions (just like other code stores system attributes using
FirstLowInvalidHeapAttributeNumber).
That would be a somewhat bigger change, but hopefully fairly
mechanical, and then some code like add_expressions_to_attributes()
would go away.
Looking at the new view pg_stats_ext_exprs, I noticed that it fails to
show expressions until the statistics have been built. For example:
CREATE TABLE foo(a int, b int);
CREATE STATISTICS s ON (a+b), (a*b) FROM foo;
SELECT statistics_name, tablename, expr, n_distinct FROM pg_stats_ext_exprs;
statistics_name | tablename | expr | n_distinct
-----------------+-----------+------+------------
s | foo | |
(1 row)
but after populating and analysing the table, this becomes:
statistics_name | tablename | expr | n_distinct
-----------------+-----------+---------+------------
s | foo | (a + b) | 11
s | foo | (a * b) | 11
(2 rows)
I think it should show the expressions even before the stats have been built.
Another issue is that it returns rows for non-expression stats as
well. For example:
CREATE TABLE foo(a int, b int);
CREATE STATISTICS s ON a, b FROM foo;
SELECT statistics_name, tablename, expr, n_distinct FROM pg_stats_ext_exprs;
statistics_name | tablename | expr | n_distinct
-----------------+-----------+------+------------
s | foo | |
(1 row)
and those values will never be populated, since they're not
expressions, so I would expect them to not be shown in the view.
So basically, instead of
+ LEFT JOIN LATERAL (
+ SELECT
+ *
+ FROM (
+ SELECT
+
unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
+ unnest(sd.stxdexpr)::pg_statistic AS a
+ ) x
+ ) stat ON sd.stxdexpr IS NOT NULL;
perhaps just
+ JOIN LATERAL (
+ SELECT
+ *
+ FROM (
+ SELECT
+
unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
+ unnest(sd.stxdexpr)::pg_statistic AS a
+ ) x
+ ) stat ON true;
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2021-01-18 15:50:37 | Re: Key management with tests |
Previous Message | Bruce Momjian | 2021-01-18 15:43:58 | Re: Key management with tests |