Re: An idea on faster CHAR field indexing

From: "Randall Parker" <randall(at)nls(dot)net>
To: "Giles Lean" <giles(at)nemeton(dot)com(dot)au>
Cc: "PostgreSQL-Dev" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: An idea on faster CHAR field indexing
Date: 2000-06-22 01:45:20
Message-ID: 01425928136378@mail.nls.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Giles,

On Thu, 22 Jun 2000 11:12:54 +1000, Giles Lean wrote:

>Yes. Some locales want strings to be ordered first by ignoring any
>accents on chracters, then using a tie-break on equal strings by doing
>a comparison that includes the accents.

I guess I don't see how this is really any different. Why order first by the character and second by the accent? For instance,
if you know the relative order of the various forms of "o" then just give them all successive numbers and do a single pass
sort. You just have to make sure that all the numbers in that set of numbers are greater than the number you assign to "m"
and less than the number you assign to "p".

>To take another of your points out of order: this is an obstacle that
>Unicode doesn't resolve. Unicode gives you a character set capable of
>representing characters from many different locales, but collation
>order will remain locale specific.

With Unicode you have to have a collation order that cuts across what use to be separate character sets in separate code
pages.

>... but due to the increased memory/disk space, this is likely not an
>optimisation. Measurements needed, I'd suggest.

But why is there increased memory and disk space? Do the fields that go into an index not now already get stored twice?
Does the index just contain a series of references to records and that is it?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Randall Parker 2000-06-22 01:52:34 Re: Thoughts on multiple simultaneous code page support
Previous Message Mikheev, Vadim 2000-06-22 01:30:23 RE: Big 7.1 open items