Skip site navigation (1) Skip section navigation (2)

Re: BUG #2317: Wrong sorting order for (VW)

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Tomas Klockar" <tomask(at)omicron(dot)se>
Cc: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>,<pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #2317: Wrong sorting order for (VW)
Date: 2006-03-16 10:08:24
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA35143@algol.sollentuna.se (view raw or flat)
Thread:
Lists: pgsql-bugs
> 			The following bug has been logged online:
> 			
> 			Bug reference:      2317
> 			Logged by:          TomasKlockar
> 			Email address:      tomask(at)omicron(dot)se
> 			PostgreSQL version: 7.3.2/7.4.7
> 			Operating system:   linux(fedora)
> 			Description:        Wrong sorting order for (VW)
> 			Details:
> 			
> 			SELECT cname FROM clients ORDER BY cname;
> 			
> 			sorts W before V and that is a mistake.
> 			
> 			the result is
> 			
> 			V
> 			W
> 			WHI
> 			Vi
> 			Wi
> 			Volvo
> 			
> 			Correct order would be to place all vV 
> before all wW.
> 			
> 			I think the database was initialized 
> with UTF8 but it might 
> 			      
> 
> 		have been 
> 		    
> 
> 			ISO_8859-1, however the error is easily 
> repeatable and ill 
> 			      
> 
> 		be happy to 
> 		    
> 
> 			provide a dump.
> 			
> 			I tested this on 7.3.2 and 7.4.7 and 
> the error was repeatable.
> 			      
> 
> 		I think the most important thing we'd need to 
> know is what 
> 		locale the database was initialized with in 
> order to try to reproduce.
> 		
> 		If you put similar data in a file and use the 
> unix "sort" 
> 		command with the same locale, do you get the same order?
> 		    
> 
> 	
> 	That looks like the typical swedish locale, which sorts 
> V and W as the
> 	same character. Yes, that can be very annoying for some 
> of us :-), but
> 	that's the way it's defined.
> 	
> 	//Magnus
> 	  
> 
> I would say that sorting V and W as the same character may 
> work in a dictionary or sometimes when sorting names, however 
> in that case why don't it sort C and K as the same character? 
> and in some cases C and S as the same. 
> 
> The best sorting algorithm should sort all characters as 
> separate characters, and if you want fancy sorting you should 
> need to turn it on for a table.
> 
> Fancy sorting, like sorting Carlsson and Karlsson together 
> since they are equal in the same way as 
> Viktor=Wiktor=Victor=Wictor. Cesar and Sesar would also have 
> to be sorted together since they are pronounced the same in 
> swedish, and if you look in a phonebook you would find them 
> at the same place.
> 
> Now I get the english words was and vas sorted together when 
> I need them separated.
> 
> Currently the sortingfunction in postgreSQL have disqualified 
> itself from beeing used, and I do the sorting in java which 
> treat them as separate caracters.

Well, PostgreSQL uses the locale functionatlity provided by your OS, so you will need to talk to them. Perhaps they already ship a different locale definition that is more suitable for your needs that you can change to? 

If you select locale=C you will get the sort all chars as separate characters. It will break your sorting of åäö, because they are not in the correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly do that. It's a matter of picking the correct locale when you initdb your database.

//Magnus

pgsql-bugs by date

Next:From: Andreas PflugDate: 2006-03-16 12:02:57
Subject: Re: BUG #2318: language
Previous:From: Tomas KlockarDate: 2006-03-16 10:03:59
Subject: Re: BUG #2317: Wrong sorting order for (VW)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group