Re: multivariate statistics (v25)

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, David Fetter <david(at)fetter(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v25)
Date: 2017-03-30 14:03:06
Message-ID: CAKJS1f-fqo97jasVF57yfVyG+=T5JLce5ynCi1vvezXxX=wgoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 March 2017 at 07:35, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:

> As I said in another thread, I pushed parts 0002,0003,0004. Tomas said
> he would try to rebase patches 0001,0005,0006 on top of what was
> committed. My intention is to give that one a look as soon as it is
> available. So we will have n-distinct and functional dependencies in
> PG10. It sounds unlikely that we will get MCVs and histograms in, since
> they're each a lot of code.
>

I've been working on the MV functional dependencies part of the patch to
polish it up a bit. Tomas has been busy with a few other duties.

I've made some changes around how clauselist_selectivity() determines if it
should try to apply any extended stats. The solution I came up with was to
add two parameters to this function, one for the RelOptInfo in question,
and one a bool to control if we should try to apply any extended stats.
For clauselist_selectivity() usage involving join rels we just pass the rel
as NULL, that way we can skip all the extended stats stuff with very low
overhead. When we actually have a base relation to pass along we can do so,
along with a true tryextstats value to have the function attempt to use any
extended stats to assist with the selectivity estimation.

When adding these two parameters I had 2nd thoughts that the "tryextstats"
was required at all. We could just have this controlled by if the rel is a
base rel of kind RTE_RELATION. I ended up having to pass these parameters
further, down to clauselist_selectivity's singleton couterpart,
clause_selectivity(). This was due to clause_selectivity() calling
clauselist_selectivity() for some clause types. I'm not entirely sure if
this is actually required, but I can't see any reason for it to cause
problems.

I've also attempted to simplify some of the logic within
clauselist_selectivity and some other parts of clausesel.c to remove some
unneeded code and make it a bit more efficient. For example, we no longer
count the attributes in the clause list before calling a similar function
to retrieve the actual attnums. This is now done as a single step.

I've not yet quite gotten as far as I'd like with this. I'd quite like to
see clauselist_ext_split() gone, and instead we could build up a bitmapset
of clause list indexes to ignore when applying the selectivity of clauses
that couldn't use any extended stats. I'm planning on having a bit more of
a look at this tomorrow.

The attached patch should apply to master as
of f90d23d0c51895e0d7db7910538e85d3d38691f0.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
mv_functional-deps_2017-03-31.patch application/octet-stream 73.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2017-03-30 14:04:17 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Robert Haas 2017-03-30 13:57:57 Re: Patch: Write Amplification Reduction Method (WARM)