Re: BUG #2317: Wrong sorting order for (VW)

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Tomas Klockar" <tomask(at)omicron(dot)se>
Cc: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #2317: Wrong sorting order for (VW)
Date: 2006-03-16 10:08:24
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA35143@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> The following bug has been logged online:
>
> Bug reference: 2317
> Logged by: TomasKlockar
> Email address: tomask(at)omicron(dot)se
> PostgreSQL version: 7.3.2/7.4.7
> Operating system: linux(fedora)
> Description: Wrong sorting order for (VW)
> Details:
>
> SELECT cname FROM clients ORDER BY cname;
>
> sorts W before V and that is a mistake.
>
> the result is
>
> V
> W
> WHI
> Vi
> Wi
> Volvo
>
> Correct order would be to place all vV
> before all wW.
>
> I think the database was initialized
> with UTF8 but it might
>
>
> have been
>
>
> ISO_8859-1, however the error is easily
> repeatable and ill
>
>
> be happy to
>
>
> provide a dump.
>
> I tested this on 7.3.2 and 7.4.7 and
> the error was repeatable.
>
>
> I think the most important thing we'd need to
> know is what
> locale the database was initialized with in
> order to try to reproduce.
>
> If you put similar data in a file and use the
> unix "sort"
> command with the same locale, do you get the same order?
>
>
>
> That looks like the typical swedish locale, which sorts
> V and W as the
> same character. Yes, that can be very annoying for some
> of us :-), but
> that's the way it's defined.
>
> //Magnus
>
>
> I would say that sorting V and W as the same character may
> work in a dictionary or sometimes when sorting names, however
> in that case why don't it sort C and K as the same character?
> and in some cases C and S as the same.
>
> The best sorting algorithm should sort all characters as
> separate characters, and if you want fancy sorting you should
> need to turn it on for a table.
>
> Fancy sorting, like sorting Carlsson and Karlsson together
> since they are equal in the same way as
> Viktor=Wiktor=Victor=Wictor. Cesar and Sesar would also have
> to be sorted together since they are pronounced the same in
> swedish, and if you look in a phonebook you would find them
> at the same place.
>
> Now I get the english words was and vas sorted together when
> I need them separated.
>
> Currently the sortingfunction in postgreSQL have disqualified
> itself from beeing used, and I do the sorting in java which
> treat them as separate caracters.

Well, PostgreSQL uses the locale functionatlity provided by your OS, so you will need to talk to them. Perhaps they already ship a different locale definition that is more suitable for your needs that you can change to?

If you select locale=C you will get the sort all chars as separate characters. It will break your sorting of åäö, because they are not in the correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly do that. It's a matter of picking the correct locale when you initdb your database.

//Magnus

Browse pgsql-bugs by date

  From Date Subject
Next Message Andreas Pflug 2006-03-16 12:02:57 Re: BUG #2318: language
Previous Message Tomas Klockar 2006-03-16 10:03:59 Re: BUG #2317: Wrong sorting order for (VW)