Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, david(dot)rowley(at)2ndquadrant(dot)com
Cc: dean(dot)a(dot)rasheed(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, thomas(dot)munro(at)enterprisedb(dot)com, bruce(at)momjian(dot)us, hornschnorter(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-15 00:06:34
Message-ID: c2406825-6ebe-26e6-91b3-ba94026546a6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/14/19 12:56 PM, Kyotaro HORIGUCHI wrote:
> At Wed, 13 Mar 2019 19:37:45 +1300, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote in <CAKJS1f_6qDQj9m2H0jF4bRkZVLpfc7O9E+MxdXrq0wgv0z1NrQ(at)mail(dot)gmail(dot)com>
>> On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> bms_member_index seems working differently than maybe expected.
>>>
>>> bms_member_index((2, 4), 0) => 0, (I think) should be -1
>>> bms_member_index((2, 4), 1) => 0, should be -1
>>> bms_member_index((2, 4), 2) => 0, should be 0
>>> bms_member_index((2, 4), 3) => 1, should be -1
>>> bms_member_index((2, 4), 4) => 1, should be 1
>>> bms_member_index((2, 4), 5) => 2, should be -1
>>> bms_member_index((2, 4), 6) => 2, should be -1
>>> ...
>>> bms_member_index((2, 4), 63) => 2, should be -1
>>> bms_member_index((2, 4), 64) => -1, correct
>>>
>>> It works correctly only when x is a member - the way the function
>>> is maybe actually used in this patch -, or needs to change the
>>> specifiction (or the comment) of the function.
>>
>> Looks like:
>>
>> + if (wordnum >= a->nwords)
>> + return -1;
>>
>> should be:
>>
>> + if (wordnum >= a->nwords ||
>> + (a->word[wordnum] & ((bitmapword) 1 << bitnum)) == 0)
>> + return -1;
>
> Yeah, seems right.
>

Yep, that was broken. The attached patch fixes this by simply calling
bms_is_member, instead of copying the checks into bms_member_index.

I've also reworked the regression tests to use a function extracting the
cardinality estimates, as proposed by Dean and David. I have not reduced
the size of data sets yet, so the tests are not much faster, but we no
longer check the exact query plan. That's probably a good idea anyway.
Actually - the tests are a bit faster because it allows removing indexes
that were used for the query plans.

FWIW I've noticed an annoying thing when modifying type of column not
included in a statistics. Consider this:

create table t (a int, b int, c text);
insert into t select mod(i,10), mod(i,10), ''
from generate_series(1,10000) s(i);
create statistics s (dependencies) on a,b from t;
analyze t;

explain analyze select * from t where a = 1 and b = 1;

QUERY PLAN
---------------------------------------------------------------------
Seq Scan on t (cost=0.00..205.00 rows=1000 width=9)
(actual time=0.014..1.910 rows=1000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 9000
Planning Time: 0.119 ms
Execution Time: 2.234 ms
(5 rows)

alter table t alter c type varchar(61);

explain analyze select * from t where a = 1 and b = 1;

QUERY PLAN
---------------------------------------------------------------------
Seq Scan on t (cost=0.00..92.95 rows=253 width=148)
(actual time=0.020..2.420 rows=1000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 9000
Planning Time: 0.128 ms
Execution Time: 2.767 ms
(5 rows)

select stxdependencies from pg_statistic_ext;

stxdependencies
------------------------------------------
{"1 => 2": 1.000000, "2 => 1": 1.000000}
(1 row)

That is, we don't remove the statistics, but the estimate still changes.
But that's because the ALTER TABLE also resets reltuples/relpages:

select relpages, reltuples from pg_class where relname = 't';

relpages | reltuples
----------+-----------
0 | 0
(1 row)

That's a bit unfortunate, and it kinda makes the whole effort to not
drop the statistics unnecessarily kinda pointless :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists-20190315.patch.gz application/gzip 39.7 KB
0002-multivariate-histograms-20190315.patch.gz application/gzip 48.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shawn Debnath 2019-03-15 00:26:11 Re: Introduce timeout capability for ConditionVariableSleep
Previous Message Peter Eisentraut 2019-03-14 23:40:05 Re: propagating replica identity to partitions