Re: the big picture for index-only scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: the big picture for index-only scans
Date: 2011-05-10 13:22:31
Message-ID: BANLkTikZY8RzB9ZcqbxZNs3=Rd=8MMkZ0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> 1. The visibility map needs to be crash-safe.  The basic idea of
>> index-only scans is that, instead of checking the heap to find out
>> whether each tuple is visible, we first check the visibility map.  If
>> the visibility map bit is set, then we know all tuples on the page are
>> visible to all transactions, and therefore the tuple of interest is
>> visible to our transaction.  Assuming that a significant number of
>> visibility map bits are set, this should enable us to avoid a fair
>> amount of I/O, especially on large tables, because the visibility map
>> is roughly 8000 times smaller than the heap, and therefore far more
>> practical to keep in cache.
>
> hm, what are the implications for tuple hint bits, short and long
> term?  I'm particularly interested if you think any hint bit i/o
> mitigation strategies are worth pursuing.

Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!) but, in brief, I think the main effect of index-only
scans is that the performance difference between a vacuumed table and
an unvacuumed table is going to increase. It's already the case that
sequential scanning a table which has been vacuumed (and, therefore,
all the pages are marked all-visible) is noticeably faster than
sequential scanning a table which is not vacuumed (even if all the
hint bits are set). Index-only scans are going to extend that by
making index scans run faster on a table with lots of all-visible
tables than on one where no pages are all-visible. So the importance
of vacuuming an insert-only table occasionally (which autovacuum won't
do, at present, until it's needed to prevent XID wraparound) is
already more than zero, and it's going to go up. But the all-visible
bits aren't quite the same as hint bits: I don't think there's any
impact on hint bits per se.

>> 2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
>> visibility map crash-safe in 9.2, people are going to want to use
>> pg_upgrade to migrate from older versions, bringing their
>> possibly-not-quite-correct visibility map forks along with them.  How
>> should we handle that?  We could (2A) arrange to have pg_upgrade nuke
>> all visibility forks when upgrading from a release where the
>> visibility map is not crash-safe to one where it is;
>
> +1 on 2A.

OK. Anybody else?

>> 3. Statistics.  I believe that in order to accurately estimate the
>> cost of an index-only scan, we're going to need to know the fraction
>> of tuples that are on pages whose visibility map bits are set.
>
> It would be helpful to know the performance benefit of index only
> scans before knowing how much benefit to attribute here.  Maybe a
> system wide kludge would for starters anyway, like assuming 60% of
> pages can be vis checked from the VM, or a single GUC, Then again,
> maybe not.

Yeah, maybe I should try to beat the main patch into some kind of
shape before working too much on the statistics stuff. Then we could
actually benchmark it a bit, which would be good. I don't think that
a system-wide kludge or GUC is going to work for prime time, but it's
probably fine for initial performance testing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-05-10 13:41:42 Re: Formatting Curmudgeons WAS: MMAP Buffers
Previous Message Pavel Stehule 2011-05-10 13:18:16 Re: "stored procedures" - use cases?