Re: index-only scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: index-only scans
Date: 2011-08-12 15:13:57
Message-ID: CA+TgmoYcW+YGOvWaX-DJbe7yQXbYGyxLo_5cgntfvh+qee4eUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 12, 2011 at 9:31 AM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>> Well... PostgreSQL can only use the index on a or the index on b, not
>> both.  This patch doesn't change that.  I'm not trying to use indexes
>> in some completely new way; I'm just trying to make them faster by
>> optimizing away the heap access.
>
> For this kind of plan :
> Bitmap Heap Scan
>  Recheck Cond
>   BitmapAnd
>      Bitmap Index Scan
>      Bitmap Index Scan
>
> It may prevent useless Heap Fetch during "Bitmap Heap Scan", isn't it ?

The input to the bitmap heap scan is just a TID bitmap. It's too late
to decide we want the index tuples at that point; we've forgotten
where they are, and even if we remembered it, it would necessarily be
any cheaper than checking the heap. We could optimize this to avoid a
heap fetch in the case where we don't need any of the tuple data at
all, but that's going to be somewhat rare, I think.

>>>> Actually, I can see a possible way to allow an index-only type
>>>> optimization to be used for bitmap scans.  As you scan the index, any
>>>> tuples that can be handled index-only get returned immediately; the
>>>> rest are thrown into a bitmap.  Once you're done examining the index,
>>>> you then do a bitmap heap scan to get the tuples that couldn't be
>>>> handled index-only.  This seems like it might be our best hope for a
>>>> "fast count(*)" type optimization, especially if you could combine it
>>>> with some method of scanning the index in physical order rather than
>>>> logical order.
>>>
>>> IIRC we expose some ideas around that, yes. (optimizing bitmap)
>>>
>>> Maybe a question that will explain me more about the feature
>>> limitation (if any):
>>> Does an index-only scan used when the table has no vismap set will
>>> cost (in duration, IO, ...) more than a normal Index scan ?
>>
>> Yeah, it'll do a bit of extra work - the btree AM will cough up the
>> tuple uselessly, and we'll check the visibility map, also uselessly.
>> Then we'll end up doing it the regular way anyhow.  I haven't measured
>> that effect yet; hopefully it's fairly small.
>
> If it is small, or if we can reduce it to be near absent.
> Then... why do we need to distinguish Index Scan at all ? (or
> Index*-Only* scan, which aren't 100% 'Only' btw)
> It is then just an internal optimisation on how we can access/use an
> index. (really good and promising one)

Right, that's kind of what I was going for. But the overhead isn't
going to be exactly zero, so I think it makes sense to disable it in
the cases where it clearly can't work. The question is really more
whether we need to get more fine-grained than that, and actually do a
cost-based comparison of an index-only scan vs. a regular index scan.
I hope not, but I haven't tested it yet.

One other thing to think about is that the choice to use an index-scan
is not free of externalities. The index-only scan is hopefully faster
than touching all the heap pages, but even if it weren't, not touching
all of the heap pages potentially means avoiding evicting things from
shared_buffers that some *other* query might need.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Byrne 2011-08-12 15:16:18 Re: Possible Bug in pg_upgrade
Previous Message Robert Haas 2011-08-12 15:04:55 Re: WIP: Fast GiST index build