Re: TODO item: Allow data to be pulled directly from indexes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Karl Schnaitter <karlsch(at)soe(dot)ucsc(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TODO item: Allow data to be pulled directly from indexes
Date: 2008-08-20 20:20:03
Message-ID: 200808202020.m7KKK3C20520@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I have added this email's URL to TODO under tuple visibility.

---------------------------------------------------------------------------

Karl Schnaitter wrote:
> Sometime last year, a discussion started about including visibility
> metadata to avoid heap fetches during an index scan:
>
> http://archives.postgresql.org/pgsql-patches/2007-10/msg00166.php
> http://archives.postgresql.org/pgsql-patches/2008-01/msg00049.php
>
> I think the last discussion on this was in April:
>
> http://archives.postgresql.org/pgsql-hackers/2008-04/msg00618.php (last
> item)
>
> I have worked with the current patch, and I have some thoughts about
> that approach and the approaches listed in the TODO item. The TODO lists
> three approaches, in short
>
> (1) Add a bit for an index tuple that indicates "visible" or "maybe
> visible."
> (2) Use a per-table bitmap that indicates which pages have at least one
> tuple that is not visible to all transactions.
> (3) Same as (2) but at the granularity of one bit per table.
>
> The approach in the patch is different:
>
> (4) Add transaction ids, etc to the index tuple (totaling 16 bytes)
>
> I would group (1) & (4) together and (2) & (3) together. I think the
> time and space trade-offs are pretty obvious, so I won't waste time on
> those.
>
> (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple.
> Tom has noted (in the linked message) that this is not reliable if the
> index has any expression-valued columns, because it is not always
> possible to find the old index entry. For this reason, the proposed
> patch does not keep visibility metadata for indexes on expressions. This
> seems like a reasonable limitation --- indexed expressions are just less
> efficient.
>
> The main difference between (1) & (4) is that (1) will sometimes require
> heap lookups and (4) never will. Moreover, the heap lookups in (1) will
> be difficult for the optimizer to estimate, unless some special
> statistics can be maintained for this purpose.
>
> I should mention there is a major flaw in the patch, because it puts
> pointers to HOT tuples in the index, in order to capture the different
> transaction ids in the chain. I think this can be fixed by only pointing
> to the root of the HOT chain, and setting xmin/xmax to the entire range
> of transaction ids spanned by the chain. I'm not sure about all the
> details (the ctid and some other bits also need to be set).
>
> (2) & (3) can work for any index, and they are quite elegant in the way
> that the overhead does not change with the number of indexes. The TODO
> also notes the benefit of (2) for efficient vacuuming. Thus, I think
> that (2) is a great idea in general, but it does not serve the intended
> purpose of this TODO item. Once a page gets marked as requiring
> visibility checks, it cannot be unmarked until the next VACUUM. The
> whole point of this feature is that we are willing to be more proactive
> during updates in order to make index access more efficient.
>
> So in summary, I think that (2) would be nice as a separate feature,
> with (1) and (4) being more favorable for index-only scans. The obvious
> trouble with (4) is the extra space overhead. There are also issues with
> correctness that I mentioned (any thoughts here would be appreciated).
> Other than that, I would favor (4) because it offers the most stable
> performance.
>
> Please let me know if you agree/disagree with anything here. I need to
> get this feature implemented for my research, but I would also love to
> contribute it to the community so your opinions matter a lot.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-08-20 21:08:17 Re: make dist does not work in VPATH
Previous Message David Fetter 2008-08-20 18:34:47 Re: SHOW ALL doesn't actually SHOW ALL