Re: Including Snapshot Info with Indexes

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Including Snapshot Info with Indexes
Date: 2007-10-08 08:40:26
Message-ID: 4709ECFA.4020905@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Gokulakannan Somasundaram wrote:
> Currently The index implementation in Postgresql does not store the
> Snapshot information in the Index. If we add the snapshot information into
> the indexing structure, we will have the following advantages.

This idea has been discussed to death many times before. Please search
the archives.

> a) There can be index only scans like Oracle

IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.

> b) Unique indexes will become less costly, as older index tuples can be
> found out.

Doesn't seem like a big benefit, considering that in most cases there
won't be any tuples in the index with a duplicate key. A common
exception to that is (non-HOT) updating a row. But in that case, the
page containing the old tuple is already in cache, so the lookup of the
visibility from the heap is cheap.

> c) Even the index scans will get faster, since some of the index tuples
> won't translate into HeapScans.

That's the same as doing an index-only-scan, right?

> d) Deletes and Updates will become slightly costly, as they have to update
> these indexes.

I think you're grossly underestimating the cost of that. For example, on
a table with 3 indexes. a delete currently requires one index lookup +
one heap lookup. With visibility in the indexes, that would require 3
index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
taking into account the non-leaf b-tree pages. The real impact will
depend on what's in cache, but the cost can be very high.

Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.

There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-10-08 09:07:11 Re: 8.4 TODO item: make src/port support libpq and ecpg directly
Previous Message Gokulakannan Somasundaram 2007-10-08 06:42:09 Including Snapshot Info with Indexes

Browse pgsql-patches by date

  From Date Subject
Next Message Csaba Nagy 2007-10-08 09:08:17 Re: Including Snapshot Info with Indexes
Previous Message Gokulakannan Somasundaram 2007-10-08 06:42:09 Including Snapshot Info with Indexes