Re: PATCH: index-only scans with partial indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: index-only scans with partial indexes
Date: 2015-07-11 12:00:53
Message-ID: 55A10575.90301@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 07/10/2015 10:43 PM, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> currently partial indexes end up not using index only scans in most
>> cases, because check_index_only() is overly conservative, as explained
>> in this comment:
>> ...
>
>> I've done a bunch of tests, and I do see small (hardly noticeable)
>> increase in planning time with long list of WHERE clauses, because all
>> those need to be checked against the index predicate. Not sure if this
>> is what's meant by 'quite expensive' in the comment. Moreover, this was
>> more than compensated by the IOS benefits (even with everything in RAM).
>
>> But maybe it's possible to fix that somehow? For example, we're
>> certainly doing those checks elsewhere when deciding which clauses need
>> to be evaluated at run-time, so maybe we could cache that somehow?
>
> The key problem here is that you're doing those proofs vastly earlier
> than before, for indexes that might not get used at all in the final
> plan. If you do some tests with multiple partial indexes you will
> probably see a bigger planning-time penalty.

Hmmm. Maybe we could get a bit smarter by looking at the attnums of each
clause before doing the expensive stuff (which is predicate_implied_by I
believe), exploiting a few simple observations:

* if the clause is already covered by attrs_used, we don't need to
process it at all

* if the clause uses attributes not included in the index predicate,
we know it can't be implied

Of course, those are local optimizations, and can't fix some of the
problems (e.g. a lot of partial indexes).

> Perhaps we should bite the bullet and do it anyway, but I'm pretty
> suspicious of any claim that the planning cost is minimal.

Perhaps - I'm not claiming the planning cost is minimal. It was in the
tests I've done, but no doubt it's possible to construct examples where
the planning time will get much worse. With 30 partial indexes, I got an
increase from 0.01 ms to ~2.5ms on simple queries.

But maybe we could get at least some of the benefits by planning the
index scans like today, and then do the IOS check later? Of course, this
won't help with cases where the index scan is thrown away while the
index only scan would win, but it does help with cases where we end up
doing index scan anyway?

That's essentially what I'm struggling right now - I do have a 3TB data
set, the plan looks like this:

QUERY PLAN
------------------------------------------------------------------------
Sort (cost=1003860164.92..1003860164.92 rows=1 width=16)
Sort Key: orders.o_orderpriority
-> HashAggregate
Group Key: orders.o_orderpriority
-> Merge Semi Join
Merge Cond:
-> Index Scan using pk_orders on orders
Filter: ((o_orderdate >= '1997-07-01'::date) AND
(o_orderdate < '1997-10-01 00:00:00'::timestamp))
-> Index Scan using lineitem_l_orderkey_idx_part1 on
lineitem

and the visibility checks from Index Scans are killing the I/O. An IOS
is likely to perform much better here (but haven't ran the query yet).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-07-11 12:09:05 Re: Re: Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)
Previous Message Jeff Davis 2015-07-11 06:28:53 Re: Memory Accounting v11