Skip site navigation (1) Skip section navigation (2)

FW: Win32 unicode vs ICU

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: FW: Win32 unicode vs ICU
Date: 2005-08-04 21:33:12
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE094656@algol.sollentuna.se (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
I just realised this mail didn't go through. Probably because it was too
large for -hackers. So: repost to -patches. Sorry about that. If it's a
duplicate, even more sorry, but I couldn't find it in the archives.

(This may explain that nobody answered me :P)

//Magnus
 

> -----Original Message-----
> From: Magnus Hagander 
> Sent: Sunday, July 31, 2005 2:09 PM
> To: PostgreSQL-development
> Cc: pgsql-hackers-win32(at)postgresql(dot)org
> Subject: Win32 unicode vs ICU
> 
> Hi!
> 
> I've been working with Palles ICU patch to make it work on 
> win32, and I believe I have it done. While doing it I noticed 
> that ICU basically converts to UTF16 and back - I previously 
> thought it worked on UTF8 strings. Based on this I also tried 
> out an implementation for the win32-unicode problem that does 
> *not* require ICU. It uses the win32 native functions to map 
> to utf16 and back, and then to process the text there. And I 
> got through with much less code than the ICU version, while 
> doing the same thing.
> 
> I am unsure of how to proceed. As I see it there are three paths:
> 1) Use native win32 functionality only on win32
> 2) Use ICU functionality only on win32
> 3) Allow both ICU and native functionality, compile time 
> switch --with-icu (same as unix with the ICU patch)
> 
> 
> The main downsides of ICU vs the native ones are:
> * ICU does not accept win32 locale names. When doing 
> setlocale("sv_se"), for example, win32 will return this in 
> later calls as "Swedish_Sweden.1252". To get around this in 
> the ICU patch, I had to implement a lookup map that converts 
> it back to sv_se for ICU.
> 
> * ICU is yet another build and runtime dependency, and a 
> large one (comes in at 11Mb for the DLL files alone in the 
> win32 download)
> 
> 
> I guess that the main upside of it is that we'd get 
> constistent behaviour - in case there are issues with either 
> ICU or win32 native they'd otherwise differ. And only one new 
> codepath. But we already live with the platform-inconsistency today...
> 
> Another upside is that it handles more encodings in ICU - my 
> native implementation does *only* UTF8 and relies on existing 
> functionality to deal with other encodings. It could of 
> course be extended if necessary, but from what I can tell 
> UTF8 is the big one.
> 
> 
> 
> I have attached both patches. For the native version, only 
> win32_utf8.patch is required.  For the ICU version, 
> icu_win32.patch is needed and also the files 
> localemap.c,localemap.pl, iso639 and iso3166 needs to go in 
> src/backend/port/win32. (the localemap needs to be updated to 
> do a better-than-linear search, but I wanted to include an example)
> 
> 
> Thoughts on the options?
> 
> 
> And anohter question - my native patch touches the same 
> functions as the ICU patch. Can somebody who knows the 
> internals confirm or deny that these are all the required 
> locations, or do we need to modify more?
> 
> (I have run simple tests in swedish locale and both behave 
> the same and correct, but I'm unsure of exactly how much 
> would be affected)
> 
> Finally, the win32 patch also changes the normal path to use 
> strncoll(). The comment above the function states that we'd 
> like to use strncoll but it's not available. Well, on win32 
> it is, so it should provide a speedup on win32. It is 
> currently not included in the ICU patch, but should probably 
> be included whichever path we'd chose.
> 
> 
> //Magnus
> 

Attachment: localemap.c
Description: application/octet-stream (1.4 KB)
Attachment: localemap.pl
Description: application/octet-stream (1.4 KB)
Attachment: icu_win32.patch
Description: application/octet-stream (21.8 KB)
Attachment: win32_utf8.patch
Description: application/octet-stream (5.9 KB)

Responses

pgsql-hackers by date

Next:From: Ian BurrellDate: 2005-08-04 22:16:40
Subject: Re: Solving the OID-collision problem
Previous:From: Mark WoodwardDate: 2005-08-04 21:22:15
Subject: Re: US Census database (Tiger 2004FE) - 4.4G

pgsql-patches by date

Next:From: Qingqing ZhouDate: 2005-08-05 01:57:23
Subject: Re: prevent encoding conversion recursive error
Previous:From: Tom LaneDate: 2005-08-04 17:15:28
Subject: Re: Enhanced containment selectivity function

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group