Skip site navigation (1) Skip section navigation (2)

Re: Block B-Tree concept

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block B-Tree concept
Date: 2006-09-29 10:33:28
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Martijn van Oosterhout wrote:
> On Fri, Sep 29, 2006 at 10:51:32AM +0100, Heikki Linnakangas wrote:
>> After some thought:
>> Imagine a normal B-tree just like what we have now. But when there is
>> more than one tuple on the same heap page with consecutive index keys,
>> we represent all of them in a single index tuple that contains the key
>> of the first one of them, and a (run-length encoded) bitmap of the
>> OffsetNumbers. We should get most of the space and I/O savings as with
>> the original proposal, but we can vacuum it without re-evaluating index
>> expressions.
> I think something like this has been discussed before. Where an index
> tuple currently has the key values followed by a ctid, simply change
> that so that it can be a list of ctid's, in order.

Actually it's t_tid followed by t_info (which is size + flags) followed 
by key values.

> This works on having the same key, but doesn't care if the tuples are
> on the same page. It used to be not possible because of how to handle
> forward and backward scanning within an index entry during updates. I
> think with the new "page at a time" index scanning, this got a lot
> easier.

I'm not very interested in the case where you have a lot of equal keys, 
I think the bitmap index am is more suitable for that. The Block B-tree 
could be used whenever you have a clustered table (including unique 
indexes). Some DBMSs have index-organized-tables for the same use case.

When I tested a prototype of the original idea with TPC-C (DBT-2) data, 
a block index on the order_line table was under 2% of the size of a 
normal B-tree. That's very close to a best-case scenario; narrow rows 
and a completely clustered table. I'm aiming at that order of magnitude 
reduction in storage size.

> One issue is that a single index page could hold more than 1000 index
> entries, which might cause problems for callers.

I don't think that's a problem.

Heikki Linnakangas

In response to

pgsql-hackers by date

Next:From: Marlon PetryDate: 2006-09-29 12:10:50
Subject: Re: Backup and restore through JDBC
Previous:From: Martijn van OosterhoutDate: 2006-09-29 10:19:07
Subject: Re: New version of money type

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group