Re: a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-31 03:53:12
Message-ID: 87a5mf837g.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello everyone,

> After some more thoughts about the diference of the two ideas, then I
> find we are resolving two different issues, just that in the wrong index
> choose cases, both of them should work generally.

Here is the formal version for the attribute reloptions direction.

commit 0d842e39275710a544b11033f5eec476147daf06 (HEAD -> force_generic)
Author: yizhi.fzh <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Sun Mar 31 11:51:28 2024 +0800

Add a attopt to disable MCV when estimating for Var = Const

As of current code, when calculating the selectivity for Var = Const,
planner first checks if the Const is an most common value and if not, it
takes out all the portions of MCV's selectivity and num of distinct
value, and treat the selectivity for Const equal for the rest
n_distinct.

This logic works great when the optimizer statistic is up to date,
however if the known most common value has taken up most of the
selectivity at the last run of analyze, and the new most common value in
reality has not been gathered, the estimation for the new MCV will be
pretty bad. A common case for this would be created_at = {current_date};

To overcome this issue, we provides a new syntax:

ALTER TABLE tablename ALTER COLUMN created_at SET (force_generic=on);

After this, planner ignores the value of MCV for this column when
estimating for Var = Const and treating all the values equally.

This would cause some badness if the values for a column are pretty not
equal which is what MCV is designed for, however this patch just provide
one more option to user and let user make the decision.

Here is an example about its user case.

create table t(a int, b int, c int) with (autovacuum_enabled=off);
create index on t(a, b);
create index on t(a, c);
create table t2 (id int primary key, a int);
insert into t2 select i , i from generate_series(1, 800)i;

insert into t select floor(random() * 100 + 1)::int, i, i
from generate_series(1, 100000) i;
analyze t,t2;

insert into t
select floor(random() * 10 + 1)::int + 100 , i, i
from generate_series(1, 10000) i;

explain (costs off) select * from t where a = 109 and b = 8;
explain (costs off, analyze)
select * from t join t2 on t.c = t2.id where t.a = 109;

ALTER TABLE t ALTER COLUMN a SET (force_generic=on);

-- We will see some good result now.
explain (costs off) select * from t where a = 109 and b = 8;
explain (costs off, analyze)
select * from t join t2 on t.c = t2.id where t.a = 109;

I will add this to our commitfest application, any feedback is welcome!

--
Best Regards
Andy Fan

Attachment Content-Type Size
v1-0001-Add-a-attopt-to-disable-MCV-when-estimating-for-V.patch text/x-diff 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-03-31 04:00:15 Re: Change GUC hashtable to use simplehash?
Previous Message Dmitry Koval 2024-03-31 02:12:19 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands