PATCH: index-only scans with partial indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: PATCH: index-only scans with partial indexes
Date: 2015-07-10 18:29:43
Message-ID: 55A00F17.1020608@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

currently partial indexes end up not using index only scans in most
cases, because check_index_only() is overly conservative, as explained
in this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to
the index, the planner will decide index only scans are not possible.
Which is a bit unfortunate, because those columns are not needed at
runtime, and will only increase the index size (and the main benefit of
partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not
implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=39.44..39.45 rows=1 width=4)
(actual time=8.350..8.354 rows=1 loops=1)
-> Index Scan using tidx_partial on t
(cost=0.28..37.98 rows=585 width=4)
(actual time=0.034..4.368 rows=999 loops=1)
Planning time: 0.197 ms
Execution time: 8.441 ms
(4 rows)

explain analyze select count(b) from t where a > 1000 and a < 2000;

QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=33.44..33.45 rows=1 width=4)
(actual time=8.019..8.023 rows=1 loops=1)
-> Index Only Scan using tidx_partial on t
(cost=0.28..31.98 rows=585 width=4)
(actual time=0.036..4.165 rows=999 loops=1)
Heap Fetches: 0
Planning time: 0.188 ms
Execution time: 8.106 ms
(5 rows)

I've done a bunch of tests, and I do see small (hardly noticeable)
increase in planning time with long list of WHERE clauses, because all
those need to be checked against the index predicate. Not sure if this
is what's meant by 'quite expensive' in the comment. Moreover, this was
more than compensated by the IOS benefits (even with everything in RAM).

But maybe it's possible to fix that somehow? For example, we're
certainly doing those checks elsewhere when deciding which clauses need
to be evaluated at run-time, so maybe we could cache that somehow?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
partial-index-only-scan-v1.patch text/x-diff 2.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Smitha Pamujula 2015-07-10 18:32:09 Re: pg_upgrade + Extensions
Previous Message Mike Blackwell 2015-07-10 18:01:17 Re: pg_upgrade + Ubuntu