Re: PATCH: index-only scans with partial indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: kgrittn(at)ymail(dot)com, simon(at)2ndQuadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: index-only scans with partial indexes
Date: 2015-09-14 11:27:47
Message-ID: 55F6AF33.8000206@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/14/2015 12:51 PM, Kyotaro HORIGUCHI wrote:
> I rethinked on this from the first.
>
>> Sorry.
>>
>>> Hi, this looks to be a bug of cost_index(). The attached patch
>>> would fix that.
>>
>> No, that's wrong. please forget the patch. The qual in qpquals
>> should be indexquals which is excluded because it is not
>> necessary to be applied. The right way would be remove the cost
>> for qpqual in cost_index().
>
> Your patch allows index only scan even if a qual contains
> non-index column when the qual can be removed by implication from
> index predicates.
>
> So the 'implied' clauses is not needed ever after. It should be
> excluded from cost estimation and it is not needed on execution
> even if index only scan is found not to be doable finally.
>
> So the implicit quals may be removed on building index paths but
> I think check_index_only is not the place.
>
> Removing implied quals from index quals is not only for index
> *only* scan so the place for removing such quals is in
> build_index_paths, in the loop of step 1. After removing the
> quals there, check_index_only will naturally give disired result.
>
> # I remember that I have tried the same or similar thing. I don't
> # recall what made me give up then.

I don't think this is particularly related to the patch, because some of
the anomalies can be observed even on master. For example, let's force
the index scans to be non-IOS by requesting another column from the heap:

create table t (a int, b int, c int);
insert into t select i,i,i from generate_series(1,1000000) s(i);

create index idx1 on t (a) where b between 300000 and 600000;
create index idx2 on t (a,b) where b between 300000 and 600000;

analyze t;
vacuum t;

The indexes have exactly the same size (thanks to alignment of
IndexTuples), and should have exactly the same statistics:

test=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+------+-------+-------+-------+---------+-------------
public | idx1 | index | user | t | 6600 kB |
public | idx2 | index | user | t | 6600 kB |
(2 rows)

Now, let's see the query reading column 'c' (forcing heap fetches)

explain select c from t where b between 300000 and 600000;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx1 on t (cost=0.42..10945.99 rows=300971 width=4)
(1 row)

drop index idx1;
set enable_bitmapscan = off;

explain select c from t where b between 300000 and 600000;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx2 on t (cost=0.42..19688.08 rows=300971 width=4)
Index Cond: ((b >= 300000) AND (b <= 600000))
(2 rows)

I claim that either both or none of the indexes should use "Index Cond".

This is exactly the same reason that lead to the strange behavior after
applying the patch, but in this case the heap access actually introduces
some additional cost so the issue is not that obvious.

But in reality the costs should be pretty much exactly the same - the
indexes have exactly the same size, statistics, selectivity etc.

Also, the plan difference suggests this is not merely a costing issue,
because while with idx1 (first plan) it was correctly detected we don't
need to evaluate the condition on the partial index, on idx2 that's not
true and we'll waste time doing that. So we probably can't just tweak
the costing a bit - this probably needs to be addressed when actually
building the index path.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-09-14 11:30:33 Re: row_security GUC, BYPASSRLS
Previous Message Amit Kapila 2015-09-14 11:25:31 Re: RFC: replace pg_stat_activity.waiting with something more descriptive