Make the qual cost on index Filter slightly higher than qual cost on index Cond.

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
Date: 2020-05-26 08:22:01
Message-ID: CAKU4AWqzTCe12EYPzTCon0+xwzoD8dWVdpqdzHH1swc=Gs5Xbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Consider the below example:

create table j1(i int, im5 int, im100 int, im1000 int);
insert into j1 select i, i%5, i%100, i%1000 from generate_series(1,
10000000)i;
create index j1_i_im5 on j1(i, im5);
create index j1_i_im100 on j1(i, im100);
analyze j1;
explain select * from j1 where i = 100 and im5 = 5;

We may get the plan like this:

demo=# explain select * from j1 where i = 100 and im5 = 1;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using j1_i_im100 on j1 (cost=0.43..8.46 rows=1 width=16)
Index Cond: (i = 100)
Filter: (im5 = 1)
(3 rows)

At this case, optimizer can estimate there are only 1 row to return, so both
indexes have same cost, which one will be choose is un-controlable. This is
fine for above query based on the estimation is accurate. However estimation
can't be always accurate in real life. Some inaccurate estimation can cause
an
wrong index choose. As an experience, j1_i_im5 index should always be choose
for above query.

This one line change is the best method I can think.

- cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ cpu_per_tuple = cpu_tuple_cost + (qpqual_cost.per_tuple * 1.001);

We make the qual cost on index filter is slightly higher than qual cost in
Index
Cond. This will also good for QUAL (i=x AND m=y AND n=z). Index are (i, m,
other_col1) and (i, other_col1, other_col2). But this change also
changed the relation between the qual cost on index scan and qual cost on
seq
scan. However I think that impact is so tiny that I think we can ignore
that (we
can choose a better factor between 1 and 1.001).

Even the root cause of this issue comes from an inaccurate estimation. but I
don't think that is an issue easy/possible to fix, however I'm open for
suggestion on that as well.

Any suggestions?

--
Best Regards
Andy Fan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-05-26 08:25:25 Re: password_encryption default
Previous Message Noah Misch 2020-05-26 07:43:34 Re: SyncRepGetSyncStandbysPriority() vs. SIGHUP