Re: a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: a wrong index choose when statistics is out of date
Date: 2024-03-08 11:53:14
Message-ID: 877cicao6e.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


After some more thoughts about the diference of the two ideas, then I
find we are resolving two different issues, just that in the wrong index
choose cases, both of them should work generally.

Your idea actually adding some rule based logic named certainty_factor,
just the implemenation is very grace. for the example in this case, it
take effects *when the both indexes has the same cost*. I believe that
can resolve the index choose here, but how about the rows estimation?
issue due to the fact that the design will not fudge the cost anyway, I
assume you will not fudge the rows or selectivity as well. Then if the
optimizer statistics is missing, what can we do for both index choosing
and rows estimation? I think that's where my idea comes out.

Due to the fact that optimizer statistics can't be up to date by design,
and assume we have a sistuation where the customer's queries needs that
statistcs often, how about doing the predication with the history
statistics? it can cover for both index choose and rows estimation. Then
the following arguments may be arised. a). we can't decide when the
missed optimizer statistics is wanted *automatically*, b). if we
predicate the esitmiation with the history statistics, the value of MCV
information is missed. The answer for them is a). It is controlled by
human with the "alter table t alter column a set
(force_generic=on)". b). it can't be resolved I think, and it only take
effects when the real Const is so different from the ones in
history. generic plan has the same issue I think.

I just reviewed the bad queries plan for the past half years internally,
I found many queries used the Nested loop which is the direct cause. now
I think I find out a new reason for this, because the missed optimizer
statistics cause the rows in outer relation to be 1, which make the Nest
loop is choosed. I'm not sure your idea could help on this or can help
on this than mine at this aspect.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2024-03-08 12:05:04 Re: Statistics Import and Export
Previous Message Jelte Fennema-Nio 2024-03-08 11:47:15 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs