An idea on faster CHAR field indexing

From: "Randall Parker" <randall(at)nls(dot)net>
To: "PostgreSQL-Dev" <pgsql-hackers(at)postgresql(dot)org>
Subject: An idea on faster CHAR field indexing
Date: 2000-06-21 20:14:58
Message-ID: 20123500514726@mail.nls.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi folks,

This is my first post to your list. I've been reading it for about a week. I like the quality of the developers here and think
this portends well for the future of Postgres.

Anyway, an idea. Not sure if RDBMSs internally already implement this technique. But in case them don't and in case
you've never thought of it here something I just thought of:

CHAR fields have different sorting (aka collation) rules for each code page. eg the very fact that A comes before B is
something that the collation info for a given code page has to specify. Well, just because a character has a lower value
than another character in its encoding in a given code page doesn't mean it gets sorted first.

So let me cut to the chase: I'm thinking that rather than store the actual character sequence of each field (or some
subset of a field) in an index why not translate the characters into their collation sequence values and store _those_ in
the index?

The idea is to reduce the number of times that string has to be converted to its mathematical sorting order representation.
Don't do it every time two strings get compared. Do it when a record is inserted or that field is updated.

Is this already done? Or is it not such a good idea for some reason?

I'd consider this idea of greater value in something like Unicode. For 16 bit Unicode the lookup table to find each
character's ordinal value (or sorting value, whatever its called) is 128k, right? Doing a bunch of look-ups into that has to
not be good for L1 and L2 cache in a processor.

Browse pgsql-hackers by date

  From Date Subject
Next Message Giles Lean 2000-06-21 20:59:06 Re: An idea on faster CHAR field indexing
Previous Message Bruce Momjian 2000-06-21 18:42:21 Re: Big 7.1 open items