Re: A wrong index choose issue because of inaccurate statistics

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: A wrong index choose issue because of inaccurate statistics
Date: 2020-06-05 03:32:43
Message-ID: CAKU4AWoNaJROJZUakdt6+d-h-yDxAfxtahqpjov0Hz0xdLbQLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>
> Why will the (a, c) be choose? If planner think a = x has only 1 row ..
>

I just did more research and found above statement is not accurate,
the root cause of this situation is because IndexSelectivity = 0. Even
through I
don't think we can fix anything here since IndexSelectivity is calculated
from
statistics and we don't know it is 1% wrong or 10% wrong or more just like
What
Tomas said.

The way of fixing it is just add a "small" extra cost for pqquals for index
scan. that should be small enough to not have impacts on others part. I will
discuss how small is small later with details, we can say it just a guc
variable
for now.

+++ b/src/backend/optimizer/path/costsize.c
@@ -730,6 +730,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double
loop_count,

cpu_run_cost += cpu_per_tuple * tuples_fetched;

+ /*
+ * To make the planner more robust to handle some inaccurate
statistics
+ * issue, we will add a extra cost to qpquals so that the less
qpquals
+ * the lower cost it has.
+ */
+ cpu_run_cost += stat_stale_cost * list_length(qpquals);
+

If we want to reduce the impact of this change further, we can only add
this if
the IndexSelecivity == 0.

How to set the value of stat_stale_cost? Since the minimum cost for a query
should be a cpu_tuple_cost which is 0.01 default. Adding an 0.01 cost for
each
pqqual in index scan should not make a big difference. However sometimes
we may
set it to 0.13 if we consider index->tree_height was estimated wrongly for
1 (cost is
50 * 0.0025 = 0.125). I don't know how it happened, but looks it do happen
in prod
environment. At the same time it is unlikely index->tree_height is estimated
wrongly for 2 or more. so basically we can set this value to 0(totally
disable
this feature), 0.01 (should be ok for most case), 0.13 (A bit aggressive).

The wrong estimation of IndexSelectitity = 0 might be common case and if
people just have 2 related index like (A, B) and (A, C). we have 50%
chances to
have a wrong decision, so I would say this case worth the troubles. My
current
implementation looks not cool, so any suggestion to research further is
pretty
welcome.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-06-05 04:20:58 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message Andres Freund 2020-06-05 03:03:10 Re: Atomic operations within spinlocks