From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: multivariate statistics (v19) |
Date: | 2016-12-30 13:12:25 |
Message-ID: | 777a9240-61a4-c09b-a4c7-ff0fdcad34f9@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/12/16 22:50, Tomas Vondra wrote:
> On 12/12/2016 12:26 PM, Amit Langote wrote:
>>
>> Hi Tomas,
>>
>> On 2016/10/30 4:23, Tomas Vondra wrote:
>>> Hi,
>>>
>>> Attached is v20 of the multivariate statistics patch series, doing
>>> mostly
>>> the changes outlined in the preceding e-mail from October 11.
>>>
>>> The patch series currently has these parts:
>>>
>>> * 0001 : (FIX) teach pull_varno about RestrictInfo
>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients
Hi,
I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).
> dropping statistics
> -------------------
>
> The statistics may be dropped automatically using DROP STATISTICS.
>
> After ALTER TABLE ... DROP COLUMN, statistics referencing are:
>
> (a) dropped, if the statistics would reference only one column
>
> (b) retained, but modified on the next ANALYZE
This should be documented in user visible form if you plan to keep it
(it does make sense to me).
> + therefore perfectly correlated. Providing additional information about
> + correlation between columns is the purpose of multivariate statistics,
> + and the rest of this section thoroughly explains how the planner
> + leverages them to improve estimates.
> + </para>
> +
> + <para>
> + For additional details about multivariate statistics, see
> + <filename>src/backend/utils/mvstats/README.stats</>. There are additional
> + <literal>READMEs</> for each type of statistics, mentioned in the following
> + sections.
> + </para>
> +
> + </sect1>
I don't think this qualifies as "thoroughly explains" ;)
> +
> +Oid
> +get_statistics_oid(List *names, bool missing_ok)
No comment?
> + case OBJECT_STATISTICS:
> + msg = gettext_noop("statistics \"%s\" does not exist, skipping");
> + name = NameListToString(objname);
> + break;
This sounds somewhat weird (plural vs singular).
> + * XXX Maybe this should check for duplicate stats. Although it's not clear
> + * what "duplicate" would mean here (wheter to compare only keys or also
> + * options). Moreover, we don't do such checks for indexes, although those
> + * store tuples and recreating a new index may be a way to fix bloat (which
> + * is a problem statistics don't have).
> + */
> +ObjectAddress
> +CreateStatistics(CreateStatsStmt *stmt)
I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).
> + if (true)
> + {
Huh?
> +
> +List *
> +RelationGetMVStatList(Relation relation)
> +{
...
> +
> +void
> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
> + int2vector *attrs, VacAttrStats **stats)
...
> +static double
> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
> + int2vector *attrs, VacAttrStats **stats,
> + int k, int *combination)
> +{
Again, these deserve comment.
I'll try to look at other patches in the series as time permits.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-12-30 13:45:34 | Re: Potential data loss of 2PC files |
Previous Message | Petr Jelinek | 2016-12-30 13:05:18 | Re: multivariate statistics (v19) |