From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: multivariate statistics (v19) |
Date: | 2017-01-26 09:43:57 |
Message-ID: | CAFiTN-sdJmUYiARd0qBQKBXqs73096qD0Hd1QrNs=JQ3F_0QGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 5, 2017 at 3:27 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Thanks. Those plans match my experiments with the TPC-H data set, although
> I've been playing with the smallest scale (1GB).
>
> It's not very difficult to make the estimation error arbitrary large, e.g.
> by using perfectly correlated (identical) columns.
I have done an initial review for ndistint and histogram patches,
there are few review comments.
ndistinct
---------
1. Duplicate statistics:
postgres=# create statistics s with (ndistinct) on (a,c) from t;
2017-01-07 16:21:54.575 IST [63817] ERROR: duplicate key value
violates unique constraint "pg_mv_statistic_name_index"
2017-01-07 16:21:54.575 IST [63817] DETAIL: Key (staname,
stanamespace)=(s, 2200) already exists.
2017-01-07 16:21:54.575 IST [63817] STATEMENT: create statistics s
with (ndistinct) on (a,c) from t;
ERROR: duplicate key value violates unique constraint
"pg_mv_statistic_name_index"
DETAIL: Key (staname, stanamespace)=(s, 2200) already exists.
For duplicate statistics, I think we can check the existence of the
statistics and give more meaningful error code something statistics
"s" already exist.
2. Typo
+ /*
+ * Sort the attnums, which makes detecting duplicies somewhat
+ * easier, and it does not hurt (it does not affect the efficiency,
+ * onlike for indexes, for example).
+ */
/onlike/unlike
3. Typo
/*
* Find attnims of MV stats using the mvoid.
*/
int2vector *
find_mv_attnums(Oid mvoid, Oid *relid)
/attnims/attnums
histograms
--------------
+ if (matches[i] == MVSTATS_MATCH_FULL)
+ s += mvhist->buckets[i]->ntuples;
+ else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+ s += 0.5 * mvhist->buckets[i]->ntuples;
Isn't it will be better that take some percentage of the bucket based
on the number of distinct element for partial matching buckets.
+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+ int2vector *stakeys,
+ MVSerializedHistogram mvhist,
+ int nmatches, char *matches,
+ bool is_or)
+{
+ int i;
For each clause we are processing all the buckets, can't we use some
data structure which can make multi-dimensions information searching
faster.
Something like HTree, RTree, Maybe storing histogram in these formats
will be difficult?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2017-01-26 09:49:51 | Re: Transactions involving multiple postgres foreign servers |
Previous Message | Kyotaro HORIGUCHI | 2017-01-26 09:12:30 | Re: Floating point comparison inconsistencies of the geometric types |