Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-09-10 22:12:36
Message-ID: 3fcfd5e5-6849-34e6-22ab-1b62d191bedb@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 07/18/2018 09:32 AM, Konstantin Knizhnik wrote:
>
>
> On 18.07.2018 02:58, Tomas Vondra wrote:
>> On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote:
>>> ...
>>>
>>> Teodor Sigaev has proposed an alternative approach for calculating
>>> selectivity of multicolumn join or compound index search.
>>> Usually DBA creates compound indexes which can be used  by optimizer to
>>> build efficient query execution plan based on index search.
>>> We can stores statistic for compound keys of such indexes and (as it is
>>> done now for functional indexes) and use it to estimate selectivity
>>> of clauses. I have implemented this idea and will publish patch in
>>> separate thread soon.
>>> Now I just want to share some results for the Tomas examples.
>>>
>>> So for Vanilla Postges without extra statistic  estimated number of rows
>>> is about 4 times smaller than real.
>>>
>> Can you please post plans with parallelism disabled, and perhaps without
>> the aggregate? Both makes reading the plans unnecessarily difficult ...
>
>
> Sorry, below are plans with disabled parallel execution on simpler
> query(a=1 and b=1):
>
> explain analyze SELECT count(*) FROM foo WHERE a=1 and b=1;
>
>
>
> Vanilla:
>
>  Aggregate  (cost=11035.86..11035.87 rows=1 width=8) (actual
> time=22.746..22.746 rows=1 loops=1)
>    ->  Bitmap Heap Scan on foo  (cost=291.35..11001.97 rows=13553
> width=0) (actual time=9.055..18.711 rows=50000 loops=1)
>          Recheck Cond: ((a = 1) AND (b = 1))
>          Heap Blocks: exact=222
>          ->  Bitmap Index Scan on foo_a_b_idx  (cost=0.00..287.96
> rows=13553 width=0) (actual time=9.005..9.005 rows=50000 loops=1)
>                Index Cond: ((a = 1) AND (b = 1))
>
>
> ----------------------------------------------------------------------
>
> Vanilla + extra statistic (create statistics ab on a,b from foo):
>
>  Aggregate  (cost=12693.35..12693.36 rows=1 width=8) (actual
> time=22.747..22.748 rows=1 loops=1)
>    ->  Bitmap Heap Scan on foo  (cost=1490.08..12518.31 rows=70015
> width=0) (actual time=9.399..18.636 rows=50000 loops=1)
>          Recheck Cond: ((a = 1) AND (b = 1))
>          Heap Blocks: exact=222
>          ->  Bitmap Index Scan on foo_a_b_idx (cost=0.00..1472.58
> rows=70015 width=0) (actual time=9.341..9.341 rows=50000 loops=1)
>                Index Cond: ((a = 1) AND (b = 1))
>
> ----------------------------------------------------------------------
>
> Multicolumn index statistic:
>
>  Aggregate  (cost=11946.35..11946.36 rows=1 width=8) (actual
> time=25.117..25.117 rows=1 loops=1)
>    ->  Bitmap Heap Scan on foo  (cost=1080.47..11819.51 rows=50736
> width=0) (actual time=11.568..21.362 rows=50000 loops=1)
>          Recheck Cond: ((a = 1) AND (b = 1))
>          Heap Blocks: exact=222
>          ->  Bitmap Index Scan on foo_a_b_idx (cost=0.00..1067.79
> rows=50736 width=0) (actual time=11.300..11.300 rows=50000 loops=1)
>                Index Cond: ((a = 1) AND (b = 1))
>

I wonder what happened to this alternative approach, relying on stats
from multicolumn indexes ...

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-09-10 23:23:00 Re: speeding up planning with partitions
Previous Message Dmitry Dolgov 2018-09-10 21:47:06 Re: Index Skip Scan