Re: multivariate statistics (v25)

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: david(dot)rowley(at)2ndquadrant(dot)com
Cc: alvherre(at)2ndquadrant(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, david(at)fetter(dot)org, dean(dot)a(dot)rasheed(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multivariate statistics (v25)
Date: 2017-03-31 08:18:21
Message-ID: 20170331.171821.165659478.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 31 Mar 2017 03:03:06 +1300, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote in <CAKJS1f-fqo97jasVF57yfVyG+=T5JLce5ynCi1vvezXxX=wgoA(at)mail(dot)gmail(dot)com>
> On 25 March 2017 at 07:35, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> > As I said in another thread, I pushed parts 0002,0003,0004. Tomas said
> > he would try to rebase patches 0001,0005,0006 on top of what was
> > committed. My intention is to give that one a look as soon as it is
> > available. So we will have n-distinct and functional dependencies in
> > PG10. It sounds unlikely that we will get MCVs and histograms in, since
> > they're each a lot of code.
> >
>
> I've been working on the MV functional dependencies part of the patch to
> polish it up a bit. Tomas has been busy with a few other duties.
>
> I've made some changes around how clauselist_selectivity() determines if it
> should try to apply any extended stats. The solution I came up with was to
> add two parameters to this function, one for the RelOptInfo in question,
> and one a bool to control if we should try to apply any extended stats.
> For clauselist_selectivity() usage involving join rels we just pass the rel
> as NULL, that way we can skip all the extended stats stuff with very low
> overhead. When we actually have a base relation to pass along we can do so,
> along with a true tryextstats value to have the function attempt to use any
> extended stats to assist with the selectivity estimation.
>
> When adding these two parameters I had 2nd thoughts that the "tryextstats"
> was required at all. We could just have this controlled by if the rel is a
> base rel of kind RTE_RELATION. I ended up having to pass these parameters
> further, down to clauselist_selectivity's singleton couterpart,
> clause_selectivity(). This was due to clause_selectivity() calling
> clauselist_selectivity() for some clause types. I'm not entirely sure if
> this is actually required, but I can't see any reason for it to cause
> problems.

I understand that the reason for tryextstats is that the two are
perfectly correlating but caluse_selectivity requires the
RelOptInfo anyway. Some comment about that may be reuiqred in the
function comment.

> I've also attempted to simplify some of the logic within
> clauselist_selectivity and some other parts of clausesel.c to remove some
> unneeded code and make it a bit more efficient. For example, we no longer
> count the attributes in the clause list before calling a similar function
> to retrieve the actual attnums. This is now done as a single step.
>
> I've not yet quite gotten as far as I'd like with this. I'd quite like to
> see clauselist_ext_split() gone, and instead we could build up a bitmapset
> of clause list indexes to ignore when applying the selectivity of clauses
> that couldn't use any extended stats. I'm planning on having a bit more of
> a look at this tomorrow.
>
> The attached patch should apply to master as
> of f90d23d0c51895e0d7db7910538e85d3d38691f0.

FWIW, I tries this. This cleanly applied on it but make ends with
the following error.

$ make -s
Writing postgres.bki
Writing schemapg.h
Writing postgres.description
Writing postgres.shdescription
Writing fmgroids.h
Writing fmgrprotos.h
Writing fmgrtab.c
make[3]: *** No rule to make target `dependencies.o', needed by `objfiles.txt'. Stop.
make[2]: *** [statistics-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Some random comments by just looking on the patch:

======
The name of the function "collect_ext_attnums", and
"clause_is_ext_compatible" seems odd since "ext" doesn't seem to
be a part of "extended statistics". Some other names looks the
same, too.

Something like "collect_e(xt)stat_compatible_attnums" and
"clause_is_e(xt)stat_compatible" seem better to me.

======
The following comment seems something wrong.

+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:

======
dependency_is_fully_matched() is not found. Maybe some other
patches are assumed?

======
+ /* see if it actually has the right */
+ ok = (NumRelids((Node *) expr) == 1) &&
+ (is_pseudo_constant_clause(lsecond(expr->args)) ||
+ (varonleft = false,
+ is_pseudo_constant_clause(linitial(expr->args))));
+
+ /* unsupported structure (two variables or so) */
+ if (!ok)
+ return true;

Ok is used only here. I don't think seeming-expressions with side
effect is not good idea here.

======
+ switch (get_oprrest(expr->opno))
+ {
+ case F_EQSEL:
+
+ /* equality conditions are compatible with all statistics */
+ break;
+
+ default:
+
+ /* unknown estimator */
+ return true;
+ }

This seems somewhat stupid..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-03-31 08:33:26 Re: Partitioned tables and relfilenode
Previous Message Andreas Karlsson 2017-03-31 08:12:13 Re: REINDEX CONCURRENTLY 2.0