Re: locale-specific sort algorithms undocumented?

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Gunther <mail(at)bucksvsbytes(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: locale-specific sort algorithms undocumented?
Date: 2004-07-26 08:49:12
Message-ID: 200407261049.12346.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> > I now find that sorting is very different with that setting: It
> > appears, through trial and error, that all non-alphanumeric
> > characters are completely ignored by ORDER BY.
>
> I doubt they are ignored completely, but they probably are ignored in
> the first-order comparison.

The way this more or less works is:

First pass: letters, numbers
Second pass: accents
Third pass: upper/lower case
Fourth pass: punctuation characters

This is all enshrined in various standards such as ISO/IEC 14651,
national standards based on it, and independent technical standards
such as the Unicode Collation Algorithm.

The latter in fact allows what many people appear to be looking for: a
"variable weighting" option that allows you to promote punctuation
characters to the first pass. But I don't think any operating system
implements that, yet.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Geoff Caplan 2004-07-26 08:58:11 Re: Sql injection attacks
Previous Message Magnus Hagander 2004-07-26 08:39:58 Re: Sql injection attacks