Re: a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-07 10:16:52
Message-ID: 87msra9vgo.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


David Rowley <dgrowleyml(at)gmail(dot)com> writes:

> On Wed, 6 Mar 2024 at 02:09, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>> This patch introduces a new attoptions like this:
>>
>> ALTER TABLE t ALTER COLUMN col set (force_generic=true);
>>
>> Then selfunc.c realizes this and ignore the special Const value, then
>> average selectivity is chosen. This fall into the weakness of generic
>> plan, but this patch doesn't introduce any new weakness and we leave the
>> decision to user which could resolve some problem. Also this logic only
>> apply to eqsel since the ineqsel have the get_actual_variable_range
>> mechanism which is helpful for index choose case at least.
>
> If you don't want the planner to use the statistics for the column why
> not just do the following?

Acutally I didn't want the planner to ignore the statistics totally, I
want the planner to treat the "Const" which probably miss optimizer part
average, which is just like what we did for generic plan for the blow
query.

prepare s as SELECT * FROM t WHERE a = $1 and b = $2;
explain (costs off) execute s(109, 8);
QUERY PLAN
---------------------------------
Index Scan using t_a_c_idx on t
Index Cond: (a = 109)
Filter: (b = 8)

(3 rows)

custom plan, Wrong index due to we have a bad estimation for a = 109.

set plan_cache_mode to force_generic_plan ;
explain (costs off) execute s(109, 8);
QUERY PLAN
---------------------------------------
Index Scan using t_a_b_idx on t
Index Cond: ((a = $1) AND (b = $2)) -- Correct index.
(2 rows)

Generic plan - we use the average estimation for the missed optimizer
statistics part and *if the new value is not so different from existing
ones*, we can get a disired result.

It is true that the "generic" way is not as exactly accurate as the
"custom" way since the later one can use the data in MCV, but that is
the cost we have to pay to make the missed optimizer statistics less
imporant and generic plan has the same issue as well. As for this
aspect, I think the way you proposed probably have a wider use case.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2024-03-07 10:20:41 Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)
Previous Message Jelte Fennema-Nio 2024-03-07 10:11:32 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel