Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
Date: 2020-05-26 22:49:40
Message-ID: CAKU4AWrS8d-Hpr4_wV1F+Oudn_NGdUYWnQpGPKYChVCNU3BY2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 26, 2020 at 9:59 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Tue, May 26, 2020 at 1:52 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> >
> >
> > Consider the below example:
> >
> > create table j1(i int, im5 int, im100 int, im1000 int);
> > insert into j1 select i, i%5, i%100, i%1000 from generate_series(1,
> 10000000)i;
> > create index j1_i_im5 on j1(i, im5);
> > create index j1_i_im100 on j1(i, im100);
> > analyze j1;
> > explain select * from j1 where i = 100 and im5 = 5;
> >
> > We may get the plan like this:
> >
> > demo=# explain select * from j1 where i = 100 and im5 = 1;
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > Index Scan using j1_i_im100 on j1 (cost=0.43..8.46 rows=1 width=16)
> > Index Cond: (i = 100)
> > Filter: (im5 = 1)
> > (3 rows)
> >
> > At this case, optimizer can estimate there are only 1 row to return, so
> both
> > indexes have same cost, which one will be choose is un-controlable. This
> is
> > fine for above query based on the estimation is accurate. However
> estimation
> > can't be always accurate in real life. Some inaccurate estimation can
> cause an
> > wrong index choose. As an experience, j1_i_im5 index should always be
> choose
> > for above query.
>
> I think we need a better example where choosing an index makes a
> difference.
>
> An index can be chosen just because it's path was created before some
> other more appropriate index but the cost difference was within fuzzy
> limit. Purely based on the order in which index paths are created.
>

Here is an further example with the above case:

demo=# insert into j1 select 1, 1, 1, 1 from generate_series(1, 100000)i;
INSERT 0 100000

With the current implementation, it is

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using j1_i_im100 on j1 (cost=0.43..8.44 rows=1 width=16)
(actual time=63.431..63.431 rows=0 loops=1)
Index Cond: (i = 1)
Filter: (im5 = 2)
Rows Removed by Filter: 100001
Planning Time: 0.183 ms
Execution Time: 63.484 ms
(6 rows)

With the patch above, it can always choose a correct index even the
statistics is inaccurate:

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Scan using j1_i_im5 on j1 (cost=0.43..8.46 rows=1 width=16) (actual
time=0.030..0.030 rows=0 loops=1)
Index Cond: ((i = 1) AND (im5 = 2))
Planning Time: 1.087 ms
Execution Time: 0.077 ms
(4 rows)

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-05-26 23:12:52 Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
Previous Message Alvaro Herrera 2020-05-26 22:27:53 Re: hash join error improvement (old)