Re: An idea on faster CHAR field indexing

From: Giles Lean <giles(at)nemeton(dot)com(dot)au>
To: "Randall Parker" <randall(at)nls(dot)net>
Cc: "PostgreSQL-Dev" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: An idea on faster CHAR field indexing
Date: 2000-06-22 01:12:54
Message-ID: 11897.961636374@nemeton.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> I'm curious as to why the need for multiple passes. Is that true
> even in Latin 1 code pages?

Yes. Some locales want strings to be ordered first by ignoring any
accents on chracters, then using a tie-break on equal strings by doing
a comparison that includes the accents.

To take another of your points out of order: this is an obstacle that
Unicode doesn't resolve. Unicode gives you a character set capable of
representing characters from many different locales, but collation
order will remain locale specific.

> If not, this optimization could at least
> be used for code pages that don't require multiple passes.

... but due to the increased memory/disk space, this is likely not an
optimisation. Measurements needed, I'd suggest.

My only experience of this was tuning a sort utility, where the extra
time to convert the strings with strxfrm() and the large additional
memory requirement killed any advantage strcmp() had over strcoll().
Whether this would be the case for database indexes in general or
ideed ever I don't know.

> As for memory usage: I don't see the issue here. The translation to
> some collation sequence has to be done anyhow.

No; you can do the comparisons in multiple passes instead without
extra storage allocation. Using multiple passes will be efficient if
the comparisons mostly don't need the second pass, which I suspect is
typical.

> Writing one's own routine to do look-ups into a collation sequence
> table is a fairly trivial exercise.

True. But if you can't do character-by-character comparisons then
such a simplistic implementation will fail.

I hadn't mentioned this time around (but see the archives for the
recent discussion of LIKE) that there are locales with 2:1 and 1:2
mappings of characters too.

Regards,

Giles

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2000-06-22 01:15:01 RE: Big 7.1 open items
Previous Message Mikheev, Vadim 2000-06-22 01:11:03 RE: Big 7.1 open items