Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tomas(dot)vondra(at)2ndquadrant(dot)com
Cc: dean(dot)a(dot)rasheed(at)gmail(dot)com, david(dot)rowley(at)2ndquadrant(dot)com, alvherre(at)2ndquadrant(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, bruce(at)momjian(dot)us, hornschnorter(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-13 04:19:51
Message-ID: 20190313.131951.178195776.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 13 Mar 2019 02:25:40 +0100, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote in <19f76496-dcf3-ccea-dd82-26fbed57b8f5(at)2ndquadrant(dot)com>
> Hi,
>
> attached is an updated version of the patch, addressing most of the
> issues raised in the recent reviews. There are two main exceptions:
>
> 1) I haven't reworked the regression tests to use a function to check
> cardinality estimates and making them faster.
>
> 2) Review handling of bitmap in statext_is_compatible_clause_internal
> when processing AND/OR/NOT clauses.
>
> I plan to look into those items next, but I don't want block review of
> other parts of the patch unnecessarily.

I briefly looked it and have some comments.

0001-multivariate-MCV-lists-20190312.patch

+/*
+ * bms_member_index
+ * determine 0-based index of the varattno in the bitmap
+ *
+ * Returns (-1) when the value is not a member.

I think the comment should be more generic.

"determine 0-based index of member x among the bitmap members"
" Returns -1 when x is not a member."

(cont'ed)
+ if (a == NULL)
+ return 0;

Isn't the case of "not a member"?

bms_member_index seems working differently than maybe expected.

bms_member_index((2, 4), 0) => 0, (I think) should be -1
bms_member_index((2, 4), 1) => 0, should be -1
bms_member_index((2, 4), 2) => 0, should be 0
bms_member_index((2, 4), 3) => 1, should be -1
bms_member_index((2, 4), 4) => 1, should be 1
bms_member_index((2, 4), 5) => 2, should be -1
bms_member_index((2, 4), 6) => 2, should be -1
...
bms_member_index((2, 4), 63) => 2, should be -1
bms_member_index((2, 4), 64) => -1, correct

It works correctly only when x is a member - the way the function
is maybe actually used in this patch -, or needs to change the
specifiction (or the comment) of the function.

+ if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+ {
+ /*
+ * Estimate selectivity on any clauses applicable by stats tracking
+ * actual values first, then apply functional dependencies on the
+ * remaining clauses.

The comment doesn't seem needed since it is mentioning the detail
of statext_clauselist_selectivity() called just below.

+ if (statext_is_kind_built(htup, STATS_EXT_MCV))
+ {
+ StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+ info->statOid = statOid;
+ info->rel = rel;
+ info->kind = STATS_EXT_MCV;
+ info->keys = bms_copy(keys);
+
+ stainfos = lcons(info, stainfos);
+ }

We are to have four kinds of extended statistics, at worst we
have a list containing four StatisticExtInfos with the same
statOid, rel, keys and only different kind. Couldn't we reverse
the structure so that StatisticExtIbfo be something like:

> struct StatsticExtInfo
> {
> NodeTag type;
> Oid statOid;
> RelOptInfo *rel;
! char kind[8]; /* arbitrary.. */
> Bitmapset *keys;

+OBJS = extended_stats.o dependencies.o mcv.o mvdistinct.o

The module for MV distinctness is named 'mvdistinct', but mcv
doesn't have the prefix. I'm not sure we need to unify the
names, though.

+Multivariate MCV (most-common values) lists are a straightforward extension of

"lists are *a*" is wrong?

@@ -223,26 +220,16 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency,

I haven't read it in datil, but why MV-MCV patch contains (maybe)
improvement of functional dependency code?

+int
+compare_scalars_simple(const void *a, const void *b, void *arg)

Seems to need a comment. "compare_scalars without tupnoLink maininance"?

+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+ return ApplySortComparator(a, false, b, false, ssup);
+}

This wrapper function doesn't seem to me required.

+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,

We have some functions named *_bsearch. If it is really
qsort_arg's bsearch versoin, it might be better to be placed in
qsort_arg.c or new file bsearch_arg.c?

+int *
+build_attnums_array(Bitmapset *attrs)

If the attrs is not offset, I'd like that it is named
differently, say, attrs_nooffset or something.

+ int i,
+ j,
+ len;

I'm not sure but is it following our coding convention?

+ items[i].values[j] = heap_getattr(rows[i],

items is needed by qsort_arg and as return value. It seems to me
that using just values[] and isnull[] make the code simpler there.

+ /* Look inside any binary-compatible relabeling (as in examine_variable) */
+ if (IsA(clause, RelabelType))
+ clause = (Node *) ((RelabelType *) clause)->arg;

This is quite a common locution so it's enough that the comment
just mention what it does, like "Remove any relabel
decorations". And relabelling can happen recursively so the 'if'
should be 'while'?

+ /* we also better ensure the Var is from the current level */
+ if (var->varlevelsup > 0)
+ return false;

I don't get the meaning of the "better". If it cannot/don't
accept subquery's output, it would be "we refuse Vars from ...",
or if the function is not assumed to receive such Vars, it should
be an assertion.

+ /* see if it actually has the right shape (one Var, one Const) */
+ ok = (NumRelids((Node *) expr) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
+ is_pseudo_constant_clause(linitial(expr->args))));

I don't think such "expression" with unidentifiable side-effect
is a good thing. Counldn't it in more plain code? (Yeah, it is
already used in clauselist_selectivity so I don't insist on
that.)

+ * This uses the function for estimating selectivity, not the operator
+ * directly (a bit awkward, but well ...).

Not only it is the right thing but actually the operators for the
type path don't have operrst.

+ * statext_is_compatible_clause
+ * Determines if the clause is compatible with MCV lists.

I think the name should contain the word "mcv". Isn't the name
better to be "staext_clause_is_mcv_compatibe"?

(Sorry, further comments may come later..)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matsumura, Ryo 2019-03-13 04:35:48 RE: ECPG regression with DECLARE STATEMENT support
Previous Message Amit Langote 2019-03-13 04:08:01 Re: BUG #15668: Server crash in transformPartitionRangeBounds