Re: prefix btree implementation

From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: prefix btree implementation
Date: 2005-10-05 22:40:43
Message-ID: di1a7o$2j6u$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org> wrote
> On Wed, Oct 05, 2005 at 12:50:41AM -0400, Tom Lane wrote:
>> Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> writes:
>> > 1/ What types of prefix compression shall we support?
>>
>> Given the requirement of datatype independence, this idea seems a
>> complete nonstarter to me...
>
> How about having each type optionally provide the required routines?
> Thus we could provide them at least for the most common datatypes, and
> the system would continue working as currently for the rest (including
> user-defined types). Cross-column prefixes would be hard to handle I
> guess, as well as TOASTed data.
>

Yes, column-wise should not be difficult since it does require no knowledge
of the data types.

We can treat cross-column or incomplete-column share in two ways. One way
(binary-comparison) is that we just compare the items binaryly, i.e.,
without knowing what in fact is stored. The other way (datatype-comparison)
is we compare the items with some knowlege of the data types. For example,
suppose the index is defined on a varchar column and the examplar data look
like this:

{3|'aaa'}{4|'aaab'}{5|'aaabc'}

The binary-comparison way can't share prefix 'aaa' at all, because it will
see 3, 4 and 5 are totally different. The datatype-comparison way can share
the prefix 'aaa', but it has to know that each varchar column is associated
with a length header. When it compares, it ignores the header. The first way
is easy to implement, and works for some data types like integers, but no
acceptable I guess, since it even does not support varchar column prefix
sharing.

We can find a way to handle the above case, but it is better to find a
general way to handle any data types(include UDT). "Each type optionally
provide the required routines" could be a way, more details?

> One problem I do see is what happens if I need to insert a new tuple in
> the page that doesn't share the prefix. It obviously would have to be
> the leftmost or rightmost item on the page, but it's possible.
>

We do the prefix sharing when we build up index only, never on the fly.

Regards,
Qingqing

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ron Peacetree 2005-10-05 23:54:15 Re: [HACKERS] A Better External Sort?
Previous Message Richard Huxton 2005-10-05 20:58:09 Resultset duplicates (was Re: prefix btree implementation)