From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: the big picture for index-only scans |
Date: | 2011-05-10 13:22:31 |
Message-ID: | BANLkTikZY8RzB9ZcqbxZNs3=Rd=8MMkZ0g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> 1. The visibility map needs to be crash-safe. The basic idea of
>> index-only scans is that, instead of checking the heap to find out
>> whether each tuple is visible, we first check the visibility map. If
>> the visibility map bit is set, then we know all tuples on the page are
>> visible to all transactions, and therefore the tuple of interest is
>> visible to our transaction. Assuming that a significant number of
>> visibility map bits are set, this should enable us to avoid a fair
>> amount of I/O, especially on large tables, because the visibility map
>> is roughly 8000 times smaller than the heap, and therefore far more
>> practical to keep in cache.
>
> hm, what are the implications for tuple hint bits, short and long
> term? I'm particularly interested if you think any hint bit i/o
> mitigation strategies are worth pursuing.
Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!) but, in brief, I think the main effect of index-only
scans is that the performance difference between a vacuumed table and
an unvacuumed table is going to increase. It's already the case that
sequential scanning a table which has been vacuumed (and, therefore,
all the pages are marked all-visible) is noticeably faster than
sequential scanning a table which is not vacuumed (even if all the
hint bits are set). Index-only scans are going to extend that by
making index scans run faster on a table with lots of all-visible
tables than on one where no pages are all-visible. So the importance
of vacuuming an insert-only table occasionally (which autovacuum won't
do, at present, until it's needed to prevent XID wraparound) is
already more than zero, and it's going to go up. But the all-visible
bits aren't quite the same as hint bits: I don't think there's any
impact on hint bits per se.
>> 2. Crash safe visibility map vs. pg_upgrade. Even if we make the
>> visibility map crash-safe in 9.2, people are going to want to use
>> pg_upgrade to migrate from older versions, bringing their
>> possibly-not-quite-correct visibility map forks along with them. How
>> should we handle that? We could (2A) arrange to have pg_upgrade nuke
>> all visibility forks when upgrading from a release where the
>> visibility map is not crash-safe to one where it is;
>
> +1 on 2A.
OK. Anybody else?
>> 3. Statistics. I believe that in order to accurately estimate the
>> cost of an index-only scan, we're going to need to know the fraction
>> of tuples that are on pages whose visibility map bits are set.
>
> It would be helpful to know the performance benefit of index only
> scans before knowing how much benefit to attribute here. Maybe a
> system wide kludge would for starters anyway, like assuming 60% of
> pages can be vis checked from the VM, or a single GUC, Then again,
> maybe not.
Yeah, maybe I should try to beat the main patch into some kind of
shape before working too much on the statistics stuff. Then we could
actually benchmark it a bit, which would be good. I don't think that
a system-wide kludge or GUC is going to work for prime time, but it's
probably fine for initial performance testing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2011-05-10 13:41:42 | Re: Formatting Curmudgeons WAS: MMAP Buffers |
Previous Message | Pavel Stehule | 2011-05-10 13:18:16 | Re: "stored procedures" - use cases? |