Re: extended statistics: n-distinct

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: extended statistics: n-distinct
Date: 2017-03-20 23:46:27
Message-ID: CAKJS1f_i+pOOMvaXEeFh+-j_yBfu+Bv8XQdEauwLxsT0y0WzWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 March 2017 at 08:02, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Here is a closer to final version of the multivariate statistics series,
> last posted at
> https://www.postgresql.org/message-id/20170316222033.ncdi7nidah2gdzjx%40alvherre.pgsql

I've made another pass over the patch.

+ A notice is issued in this case. Note that there is no guarantee that
+ the existing statistics is anything like the one that would have been
+ created.

I think the final sentence would be better read as "Note that only the
name of the statistics object is considered here. The definition of
the statistics is not considered"

+ * This is not handled automatically by DROP TABLE because statistics are
+ * on their own schema.

"in their own schema" ?

+ /* now that we know the number of attributes, allocate the attribute */
+ item->attrs = (AttrNumber *) palloc0(item->nattrs * sizeof(AttrNumber));

there's still a number of palloc0() calls in mvdist.c that could be
simple palloc() calls.

+ int nmultiple,
+ summultiple;

these seem not initialised to 0 before they're used.

+int
+compare_scalars_simple(const void *a, const void *b, void *arg)

I don't see where this is used.

+int
+compare_scalars_partition(const void *a, const void *b, void *arg)

or this

+int
+multi_sort_compare_dims(int start, int end,

should have a comment

+bool
+stats_are_enabled(HeapTuple htup, char type)

missing comment too

+ appendStringInfo(&buf, "CREATE STATISTICS %s ON (",
+ quote_identifier(NameStr(statextrec->staname)));

I know I wrote this bit, but I did it like that because I didn't know
what stat types would be included. Do we need a WITH(ndistinct) or
something in there?

+ attname = get_relid_attribute_name(statextrec->starelid, attnum);

This one is my fault. It needs a quote_identifier() around it:

postgres=# create table mytable ("select" int, "insert" int);
CREATE TABLE
postgres=# create statistics mytable_stats on ("select","insert") from mytable;
CREATE STATISTICS
postgres=# select pg_get_statisticsextdef(oid) from pg_statistic_ext
where staname = 'mytable_stats';
pg_get_statisticsextdef
------------------------------------------------------------------
CREATE STATISTICS mytable_stats ON (select, insert) FROM mytable

+ while (HeapTupleIsValid(htup = systable_getnext(indscan)))
+ /* TODO maybe include only already built statistics? */
+ result = insert_ordered_oid(result, HeapTupleGetOid(htup));

+ /* XXX ensure this is true. It was broken in v25 0002 */

can now remove this comment. I see you added a check to only allow
stats on tables and MVs.

+ printfPQExpBuffer(&buf,
+ "SELECT oid, stanamespace::regnamespace AS nsp, staname, stakeys,\n"
+ " (SELECT string_agg(attname::text,', ') \n"

Might need to do something with pg_catalog.quote_ident(attname) here:

postgres=# \d mytable
Table "public.mytable"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
select | integer | | |
insert | integer | | |
Indexes:
"mytable_select_insert_idx" btree ("select", insert)
Statistics:
"public.mytable_stats" WITH (ndistinct) ON (select, insert)

notice lack of quoting on 'select'.

List *keys; /* String nodes naming referenced columns */

Are these really keys? Sounds more like something you might call it if
it were an index. Maybe it should just be "columns"? Although we need
to consider that one day they might be expressions too.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-03-20 23:54:20 Re: Logical replication existing data copy
Previous Message David Steele 2017-03-20 23:23:46 Re: increasing the default WAL segment size