Re: multivariate statistics (v19)

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v19)
Date: 2016-12-30 13:12:25
Message-ID: 777a9240-61a4-c09b-a4c7-ff0fdcad34f9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/16 22:50, Tomas Vondra wrote:
> On 12/12/2016 12:26 PM, Amit Langote wrote:
>>
>> Hi Tomas,
>>
>> On 2016/10/30 4:23, Tomas Vondra wrote:
>>> Hi,
>>>
>>> Attached is v20 of the multivariate statistics patch series, doing
>>> mostly
>>> the changes outlined in the preceding e-mail from October 11.
>>>
>>> The patch series currently has these parts:
>>>
>>> * 0001 : (FIX) teach pull_varno about RestrictInfo
>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients

Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).

> dropping statistics
> -------------------
>
> The statistics may be dropped automatically using DROP STATISTICS.
>
> After ALTER TABLE ... DROP COLUMN, statistics referencing are:
>
> (a) dropped, if the statistics would reference only one column
>
> (b) retained, but modified on the next ANALYZE

This should be documented in user visible form if you plan to keep it
(it does make sense to me).

> + therefore perfectly correlated. Providing additional information about
> + correlation between columns is the purpose of multivariate statistics,
> + and the rest of this section thoroughly explains how the planner
> + leverages them to improve estimates.
> + </para>
> +
> + <para>
> + For additional details about multivariate statistics, see
> + <filename>src/backend/utils/mvstats/README.stats</>. There are additional
> + <literal>READMEs</> for each type of statistics, mentioned in the following
> + sections.
> + </para>
> +
> + </sect1>

I don't think this qualifies as "thoroughly explains" ;)

> +
> +Oid
> +get_statistics_oid(List *names, bool missing_ok)

No comment?

> + case OBJECT_STATISTICS:
> + msg = gettext_noop("statistics \"%s\" does not exist, skipping");
> + name = NameListToString(objname);
> + break;

This sounds somewhat weird (plural vs singular).

> + * XXX Maybe this should check for duplicate stats. Although it's not clear
> + * what "duplicate" would mean here (wheter to compare only keys or also
> + * options). Moreover, we don't do such checks for indexes, although those
> + * store tuples and recreating a new index may be a way to fix bloat (which
> + * is a problem statistics don't have).
> + */
> +ObjectAddress
> +CreateStatistics(CreateStatsStmt *stmt)

I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).

> + if (true)
> + {

Huh?

> +
> +List *
> +RelationGetMVStatList(Relation relation)
> +{
...
> +
> +void
> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
> + int2vector *attrs, VacAttrStats **stats)
...
> +static double
> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
> + int2vector *attrs, VacAttrStats **stats,
> + int k, int *combination)
> +{

Again, these deserve comment.

I'll try to look at other patches in the series as time permits.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-12-30 13:45:34 Re: Potential data loss of 2PC files
Previous Message Petr Jelinek 2016-12-30 13:05:18 Re: multivariate statistics (v19)