|From:||Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>|
|Cc:||dean(dot)a(dot)rasheed(at)gmail(dot)com, david(dot)rowley(at)2ndquadrant(dot)com, alvherre(at)2ndquadrant(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, bruce(at)momjian(dot)us, hornschnorter(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org|
|Subject:||Re: [HACKERS] PATCH: multivariate histograms and MCV lists|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
At Wed, 13 Mar 2019 02:25:40 +0100, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote in <19f76496-dcf3-ccea-dd82-26fbed57b8f5(at)2ndquadrant(dot)com>
> attached is an updated version of the patch, addressing most of the
> issues raised in the recent reviews. There are two main exceptions:
> 1) I haven't reworked the regression tests to use a function to check
> cardinality estimates and making them faster.
> 2) Review handling of bitmap in statext_is_compatible_clause_internal
> when processing AND/OR/NOT clauses.
> I plan to look into those items next, but I don't want block review of
> other parts of the patch unnecessarily.
I briefly looked it and have some comments.
+ * bms_member_index
+ * determine 0-based index of the varattno in the bitmap
+ * Returns (-1) when the value is not a member.
I think the comment should be more generic.
"determine 0-based index of member x among the bitmap members"
" Returns -1 when x is not a member."
+ if (a == NULL)
+ return 0;
Isn't the case of "not a member"?
bms_member_index seems working differently than maybe expected.
bms_member_index((2, 4), 0) => 0, (I think) should be -1
bms_member_index((2, 4), 1) => 0, should be -1
bms_member_index((2, 4), 2) => 0, should be 0
bms_member_index((2, 4), 3) => 1, should be -1
bms_member_index((2, 4), 4) => 1, should be 1
bms_member_index((2, 4), 5) => 2, should be -1
bms_member_index((2, 4), 6) => 2, should be -1
bms_member_index((2, 4), 63) => 2, should be -1
bms_member_index((2, 4), 64) => -1, correct
It works correctly only when x is a member - the way the function
is maybe actually used in this patch -, or needs to change the
specifiction (or the comment) of the function.
+ if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+ * Estimate selectivity on any clauses applicable by stats tracking
+ * actual values first, then apply functional dependencies on the
+ * remaining clauses.
The comment doesn't seem needed since it is mentioning the detail
of statext_clauselist_selectivity() called just below.
+ if (statext_is_kind_built(htup, STATS_EXT_MCV))
+ StatisticExtInfo *info = makeNode(StatisticExtInfo);
+ info->statOid = statOid;
+ info->rel = rel;
+ info->kind = STATS_EXT_MCV;
+ info->keys = bms_copy(keys);
+ stainfos = lcons(info, stainfos);
We are to have four kinds of extended statistics, at worst we
have a list containing four StatisticExtInfos with the same
statOid, rel, keys and only different kind. Couldn't we reverse
the structure so that StatisticExtIbfo be something like:
> struct StatsticExtInfo
> NodeTag type;
> Oid statOid;
> RelOptInfo *rel;
! char kind; /* arbitrary.. */
> Bitmapset *keys;
+OBJS = extended_stats.o dependencies.o mcv.o mvdistinct.o
The module for MV distinctness is named 'mvdistinct', but mcv
doesn't have the prefix. I'm not sure we need to unify the
+Multivariate MCV (most-common values) lists are a straightforward extension of
"lists are *a*" is wrong?
@@ -223,26 +220,16 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,
I haven't read it in datil, but why MV-MCV patch contains (maybe)
improvement of functional dependency code?
+compare_scalars_simple(const void *a, const void *b, void *arg)
Seems to need a comment. "compare_scalars without tupnoLink maininance"?
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+ return ApplySortComparator(a, false, b, false, ssup);
This wrapper function doesn't seem to me required.
+/* simple counterpart to qsort_arg */
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,
We have some functions named *_bsearch. If it is really
qsort_arg's bsearch versoin, it might be better to be placed in
qsort_arg.c or new file bsearch_arg.c?
If the attrs is not offset, I'd like that it is named
differently, say, attrs_nooffset or something.
+ int i,
I'm not sure but is it following our coding convention?
+ items[i].values[j] = heap_getattr(rows[i],
items is needed by qsort_arg and as return value. It seems to me
that using just values and isnull make the code simpler there.
+ /* Look inside any binary-compatible relabeling (as in examine_variable) */
+ if (IsA(clause, RelabelType))
+ clause = (Node *) ((RelabelType *) clause)->arg;
This is quite a common locution so it's enough that the comment
just mention what it does, like "Remove any relabel
decorations". And relabelling can happen recursively so the 'if'
should be 'while'?
+ /* we also better ensure the Var is from the current level */
+ if (var->varlevelsup > 0)
+ return false;
I don't get the meaning of the "better". If it cannot/don't
accept subquery's output, it would be "we refuse Vars from ...",
or if the function is not assumed to receive such Vars, it should
be an assertion.
+ /* see if it actually has the right shape (one Var, one Const) */
+ ok = (NumRelids((Node *) expr) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
I don't think such "expression" with unidentifiable side-effect
is a good thing. Counldn't it in more plain code? (Yeah, it is
already used in clauselist_selectivity so I don't insist on
+ * This uses the function for estimating selectivity, not the operator
+ * directly (a bit awkward, but well ...).
Not only it is the right thing but actually the operators for the
type path don't have operrst.
+ * statext_is_compatible_clause
+ * Determines if the clause is compatible with MCV lists.
I think the name should contain the word "mcv". Isn't the name
better to be "staext_clause_is_mcv_compatibe"?
(Sorry, further comments may come later..)
NTT Open Source Software Center
|Next Message||Matsumura, Ryo||2019-03-13 04:35:48||RE: ECPG regression with DECLARE STATEMENT support|
|Previous Message||Amit Langote||2019-03-13 04:08:01||Re: BUG #15668: Server crash in transformPartitionRangeBounds|