Re: a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-04-28 02:39:27
Message-ID: 87h6fmjj2u.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Andy Fan <zhihuifan1213(at)163(dot)com> writes:

> Hello everyone,
>
>> After some more thoughts about the diference of the two ideas, then I
>> find we are resolving two different issues, just that in the wrong index
>> choose cases, both of them should work generally.
>
> Here is the formal version for the attribute reloptions direction.

> commit 0d842e39275710a544b11033f5eec476147daf06 (HEAD -> force_generic)
> Author: yizhi.fzh <yizhi(dot)fzh(at)alibaba-inc(dot)com>
> Date: Sun Mar 31 11:51:28 2024 +0800
>
> Add a attopt to disable MCV when estimating for Var = Const
>
> As of current code, when calculating the selectivity for Var = Const,
> planner first checks if the Const is an most common value and if not, it
> takes out all the portions of MCV's selectivity and num of distinct
> value, and treat the selectivity for Const equal for the rest
> n_distinct.
>
> This logic works great when the optimizer statistic is up to date,
> however if the known most common value has taken up most of the
> selectivity at the last run of analyze, and the new most common value in
> reality has not been gathered, the estimation for the new MCV will be
> pretty bad. A common case for this would be created_at = {current_date};
>
> To overcome this issue, we provides a new syntax:
>
> ALTER TABLE tablename ALTER COLUMN created_at SET (force_generic=on);
>
> After this, planner ignores the value of MCV for this column when
> estimating for Var = Const and treating all the values equally.
>
> This would cause some badness if the values for a column are pretty not
> equal which is what MCV is designed for, however this patch just provide
> one more option to user and let user make the decision.
>
> Here is an example about its user case.

...

Here are some add-ups for this feature:

- After the use this feature, we still to gather the MCV on these
columns because they are still useful for the join case, see
eqjoinsel_inner function.

- Will this feature make some cases worse since it relies on the fact
that not using the MCV list for var = Const? That's is true in
theory. But if user use this feature right, they will not use this
feature for these columns. The feature is just designed for the user
case in the commit message and the theory is exactly same as generic
plan. If user uses it right, they may save the effort of run 'analyze'
pretty frequently and get some better result on both index choose and
rows estimation. Plus the patch is pretty not aggressive and it's easy
to maintain.

- Is the 'force_generic' a good name for attribute option? Probably not,
we can find out a good name after we agree on this direction.

- Will it be conflicted with David's idea of certainty_factor? Probably
not,even both of them can handle the index-choose-case. See my point
on [1]

[1] https://www.postgresql.org/message-id/877cicao6e.fsf%40163.com

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2024-04-28 02:59:44 Fix overflow hazard in timestamp_pl_interval
Previous Message Andy Fan 2024-04-28 02:07:01 Re: using extended statistics to improve join estimates