Re: multivariate statistics (v25)

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, David Fetter <david(at)fetter(dot)org>, dean(dot)a(dot)rasheed(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v25)
Date: 2017-03-16 04:36:51
Message-ID: 20170316043651.ncca27wsikoxuhc6@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley wrote:

> + k = -1;
> + while ((k = bms_next_member(attnums, k)) >= 0)
> + {
> + bool attr_found = false;
> + for (i = 0; i < info->stakeys->dim1; i++)
> + {
> + if (info->stakeys->values[i] == k)
> + {
> + attr_found = true;
> + break;
> + }
> + }
> +
> + /* found attribute not covered by this ndistinct stats, skip */
> + if (!attr_found)
> + {
> + matches = false;
> + break;
> + }
> + }
>
> Would it be better just to stuff info->stakeys->values into a bitmapset and
> check its a subset of attnums? It would mean allocating memory in the loop,
> so maybe you think otherwise, but in that case maybe StatisticExtInfo
> should store the bitmapset?

Yeah, I think StatisticExtInfo should have a bitmapset, not an
int2vector.

> + appendPQExpBuffer(&buf, "(dependencies)");
>
> I think it's better practice to use appendPQExpBufferStr() when there's no
> formatting. It'll perform marginally better, which might not be important
> here, but it sets a better example for people to follow when performance is
> more critical.

FWIW this should have said "(ndistinct)" anyway :-)

> + change the definition of a extended statistics
>
> "a" should be "an", Also is statistics plural here. It's commonly mixed up
> in the patch. I think it needs standardised. I personally think if you're
> speaking of a single pg_statatic_ext row, then it should be singular. Yet,
> I'm aware you're using plural for the CREATE STATISTICS command, to me that
> feels a bit like: CREATE TABLES mytable (); am I somehow thinking wrongly
> somehow here?

This was discussed upthread as I recall. This is what Merriam-Webster says on
the topic:

statistic
1 : a single term or datum in a collection of statistics
2 a : a quantity (as the mean of a sample) that is computed from a sample;
specifically : estimate 3b
b : a random variable that takes on the possible values of a statistic

statistics
1 : a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data
2 : a collection of quantitative data

Now, I think there's room to say that a single object created by the new CREATE
STATISTICS is really the latter, not the former. I find it very weird
that a single of these objects is named in the plural form, though, and
it looks odd all over the place. I would rather use the term
"statistics object", and then we can continue using the singular.

> + If a schema name is given (for example, <literal>CREATE STATISTICS
> + myschema.mystat ...</>) then the statistics is created in the specified
> + schema. Otherwise it is created in the current schema. The name of
>
> What's created in the current schema? I thought this was just for naming?

Well, "created in a schema" means that the object is named after that
schema. So both are the same thing. Is this unclear in some way?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-03-16 04:43:04 Split conditions on relations
Previous Message Haribabu Kommi 2017-03-16 04:29:17 Re: Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags