From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, David Fetter <david(at)fetter(dot)org>, dean(dot)a(dot)rasheed(at)gmail(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: multivariate statistics (v25) |
Date: | 2017-03-14 10:59:57 |
Message-ID: | CAKJS1f80E=Vh+phdKNpLs82yaaZJuO+_-ABZTGBvZGs39ndXQw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 13 March 2017 at 23:00, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:
>
> 0003:
>
> No more time today. Will try and get to those soon.
>
0003:
I've now read this patch. My main aim here was to learn what it does and
how it works. I need to spend much longer understanding how your
calculating the functional dependencies.
In the meantime I've pasted the notes I took while reading over the patch.
+ default:
+ elog(ERROR, "unexcpected statistics type requested: %d", type);
"unexpected", but we generally use "unknown".
@@ -1293,7 +1294,8 @@ get_relation_statistics(RelOptInfo *rel, Relation
relation)
info->rel = rel;
/* built/available statistics */
- info->ndist_built = true;
+ info->ndist_built = stats_are_built(htup, STATS_EXT_NDISTINCT);
+ info->deps_built = stats_are_built(htup, STATS_EXT_DEPENDENCIES);
I don't really like how this function is shaping up. You're calling
stats_are_built() potentially twice for each stats type. There must be a
nicer way to do this. Are non-built stats common enough to optimize
building a StatisticExtInfo regardless and throwing it away if it happens
to be useless?
Can you also rename mvoid to become something more esoid or similar. I seem
to always read it as m-void instead of mv-oid and naturally I expect a void
pointer rather than an Oid.
+dependencies, and for each one count the number of rows rows consistent it.
duplicate word "rows"
+Apllying the functional dependencies is fairly simple - given a list of
Applying
+In this case the default estimation based on AVIA principle happens to work
hmm, maybe I should know what AVIA principles are, but I don't. Is there
something I should be reading? I searched a bit around the internet for a
few minutes it didn't seem have a great idea either.
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
2017
+ Assert(tmp <= ((char *) output + len));
Shouldn't you just Assert(tmp == ((char *) output + len)); at the end of
the loop?
+ if (dependencies->magic != STATS_DEPS_MAGIC)
+ elog(ERROR, "invalid dependency magic %d (expected %dd)",
+ dependencies->magic, STATS_DEPS_MAGIC);
+
+ if (dependencies->type != STATS_DEPS_TYPE_BASIC)
+ elog(ERROR, "invalid dependency type %d (expected %dd)",
+ dependencies->type, STATS_DEPS_TYPE_BASIC);
%dd ?
+ Assert(dependencies->ndeps > 0);
Why Assert() and not elog() ? Wouldn't think mean that a corrupt dependency
could fail an Assert
+ dependencies = (MVDependencies) palloc0(sizeof(MVDependenciesData));
Why palloc0() and not palloc()?
Can you not just read it into a variable on the stack, then check the exact
size using tempdeps.ndeps * sizeof(MVDependency), then memcpy() it over?
That'll save you the realloc()
+ /* what minimum bytea size do we expect for those parameters */
+ expected_size = offsetof(MVDependenciesData, deps) +
+ dependencies->ndeps * (offsetof(MVDependencyData, attributes) +
+ sizeof(AttrNumber) * 2);
Can't quite make sense of this yet. Why * 2?
+ /* is the number of attributes valid? */
+ Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS));
Seems like a bad idea to Assert() this. Wouldn't some bad data being
deserialized cause an Assert failure?
+ d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes) +
+ (k * sizeof(AttrNumber)));
Why palloc0(), you seem to write out all the fields right away. Seems like
a waste to zero the memory.
+ /* still within the bytea */
+ Assert(tmp <= ((char *) data + VARSIZE_ANY(data)));
Any point? You're already Asserting that you've consumed the entire array
at the end anyway.
+ appendStringInfoString(&str, "[");
appendStringInfoChar(&str. '['); would be better.
+ ret = pstrdup(str.data);
ret = pnstrdup(str.data, str.len);
+CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM
functional_dependencies;
+ERROR: duplicate column name in statistics definition
Is it worth mentioning which column here?
I'll try to spend more time understanding 0003 soon.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | DEV_OPS | 2017-03-14 11:00:07 | Re: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions |
Previous Message | Heikki Linnakangas | 2017-03-14 10:25:38 | Re: [PATCH] SortSupport for macaddr type |