Re: multivariate statistics (v19)

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v19)
Date: 2016-10-04 08:15:27
Message-ID: 9f7d5c73-71d6-fbe0-c190-b321db46f88c@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/04/2016 10:49 AM, Dean Rasheed wrote:
> On 30 September 2016 at 12:10, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> I fear that using "statistics" as the name of the new object might get a bit
>> awkward. "statistics" is a plural, but we use it as the name of a single
>> object, like "pants" or "scissors". Not sure I have any better ideas though.
>> "estimator"? "statistics collection"? Or perhaps it should be singular,
>> "statistic". I note that you actually called the system table
>> "pg_mv_statistic", in singular.
>
> I think it's OK. The functional dependency is a single statistic, but
> MCV lists and histograms are multiple statistics (multiple facts about
> the data sampled), so in general when you create one of these new
> objects, you are creating multiple statistics about the data.

Ok. I don't really have any better ideas, was just hoping that someone
else would.

> Also I find "CREATE STATISTIC" just sounds a bit clumsy compared to
> "CREATE STATISTICS".

Agreed.

> The convention for naming system catalogs seems to be to use the
> singular for tables and plural for views, so I guess we should stick
> with that.

However, for tables and views, each object you store in those views is a
"table" or "view", but with this thing, the object you store is
"statistics". Would you have a catalog table called "pg_scissor"?

We call the current system table "pg_statistic", though. I agree we
should call it pg_mv_statistic, in singular, to follow the example of
pg_statistic.

Of course, the user-friendly system view on top of that is called
"pg_stats", just to confuse things more :-).

> It doesn't seem like the end of the world that it doesn't
> match the user-facing syntax. A bigger concern is the use of "mv" in
> the name, because as has already been pointed out, this table may also
> in the future be used to store univariate expression and partial
> statistics, so I think we should drop the "mv" and go with something
> like pg_statistic_ext, or some other more general name.

Also, "mv" makes me think of materialized views, which is completely
unrelated to this.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Wagner 2016-10-04 08:28:42 Re: [PATCH] Generic type subscription
Previous Message Amit Langote 2016-10-04 08:02:54 Re: Declarative partitioning - another take