Re: Block B-Tree concept

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block B-Tree concept
Date: 2006-09-29 12:58:10
Message-ID: 1159534690.2767.348.camel@holly
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2006-09-29 at 10:51 +0100, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > If we want to keep the property that VACUUM doesn't re-evaluate index
> > entries, any proposal that doesn't keep track of every heap tuple
> > isn't going to work. I'll try to modify the design I had in mind so
> > that it does keep track of every heap tuple in some form.

The ideal situation is that we have one index pointer per block, so we
should look for that and optimize accordingly. We mark the heap block to
indicate how many block index pointers there are to it. If we have only
a single pointer, then VACUUM will only have to touch the index pointer
when the whole heap block is removed. In that case we have no
re-evaluation of the index AFAICS.

> After some thought:
>
> Imagine a normal B-tree just like what we have now. But when there is
> more than one tuple on the same heap page with consecutive index keys,
> we represent all of them in a single index tuple that contains the key
> of the first one of them, and a (run-length encoded) bitmap of the
> OffsetNumbers. We should get most of the space and I/O savings as with
> the original proposal, but we can vacuum it without re-evaluating index
> expressions.

The benefit we're seeking with a block index is that most INSERTs don't
write to the index. With that scheme we'd need to continually update the
index tuple so that it exactly represented the heap after each inserted
tuple, which is going to cause a hot block problem.

Much of that can go away if we have a bulk_heap_insert() which puts the
index entries in at the end of the block, though that needs some heavy
thought also.

Can we have this scheme enabled *only* for functional block indexes?
Normal case is likely to be monotonically ascending keys for real world
objects like Orders, Calls, Transactions etc.. It sounds like the
original proposal is still valid for those cases.

The bitmap would allow us to access heap rows faster in some
circumstances, I suppose.

Multi-block bitmaps do make this too much like bitmap indexes and the
use-case is very different. [Is there some kind of hybrid solution of
block & bitmap indexes?]

> It does change the format of an index tuple, unlike the original
> proposal. That adds some complexity. but it's doable.

Can we use an info bit to have two index tuple formats?
- single tuple (as now)
- multiple tuple block bitmap (as you suggest)

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2006-09-29 12:59:09 Re: Block B-Tree concept
Previous Message Andrew Dunstan 2006-09-29 12:55:13 Re: Backup and restore through JDBC