Re: Additional improvements to extended statistics

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional improvements to extended statistics
Date: 2020-03-09 00:01:57
Message-ID: 20200309000157.ig5tcrynvaqu4ixd@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 08, 2020 at 07:17:10PM +0000, Dean Rasheed wrote:
>On Fri, 6 Mar 2020 at 12:58, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> Here is a rebased version of this patch series. I've polished the first
>> two parts a bit - estimation of OR clauses and (Var op Var) clauses.
>>
>
>Hi,
>
>I've been looking over the first patch (OR list support). It mostly
>looks reasonable to me, except there's a problem with the way
>statext_mcv_clauselist_selectivity() combines multiple stat_sel values
>into the final result -- in the OR case, it needs to start with sel =
>0, and then apply the OR formula to factor in each new estimate. I.e.,
>this isn't right for an OR list:
>
> /* Factor the estimate from this MCV to the oveall estimate. */
> sel *= stat_sel;
>
>(Oh and there's a typo in that comment: s/oveall/overall/).
>
>For example, with the regression test data, this isn't estimated well:
>
> SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0;
>
>Similarly, if no extended stats can be applied it needs to return 0
>not 1, for example this query on the test data:
>
> SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL;
>

Ah, right. Thanks for noticing this. Attaches is an updated patch series
with parts 0002 and 0003 adding tests demonstrating the issue and then
fixing it (both shall be merged to 0001).

>It might also be worth adding a couple more regression test cases like these.

Agreed, 0002 adds a couple of relevant tests.

Incidentally, I've been working on improving test coverage for extended
stats over the past few days (it has ~80% lines covered, which is not
bad nor great). I haven't submitted that to hackers yet, because it's
mostly mechanical and it's would interfere with the two existing threads
about extended stats ...

Speaking of which, would you take a look at [1]? I think supporting SAOP
is fine, but I wonder if you agree with my conclusion we can't really
support inclusion @> as explained in [2].

[1] https://www.postgresql.org/message-id/flat/13902317(dot)Eha0YfKkKy(at)pierred-pdoc
[2] https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-03-09 00:06:21 Re: Additional improvements to extended statistics
Previous Message Jesse Zhang 2020-03-08 23:44:21 Re: Use compiler intrinsics for bit ops in hash