Skip site navigation (1) Skip section navigation (2)

Re: FW: Win32 unicode vs ICU

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: FW: Win32 unicode vs ICU
Date: 2005-08-24 15:59:08
Message-ID: 200508241559.j7OFx8008679@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
This has been saved for the 8.2 release:

	http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Magnus Hagander wrote:
> I just realised this mail didn't go through. Probably because it was too
> large for -hackers. So: repost to -patches. Sorry about that. If it's a
> duplicate, even more sorry, but I couldn't find it in the archives.
> 
> (This may explain that nobody answered me :P)
> 
> //Magnus
>  
> 
> > -----Original Message-----
> > From: Magnus Hagander 
> > Sent: Sunday, July 31, 2005 2:09 PM
> > To: PostgreSQL-development
> > Cc: pgsql-hackers-win32(at)postgresql(dot)org
> > Subject: Win32 unicode vs ICU
> > 
> > Hi!
> > 
> > I've been working with Palles ICU patch to make it work on 
> > win32, and I believe I have it done. While doing it I noticed 
> > that ICU basically converts to UTF16 and back - I previously 
> > thought it worked on UTF8 strings. Based on this I also tried 
> > out an implementation for the win32-unicode problem that does 
> > *not* require ICU. It uses the win32 native functions to map 
> > to utf16 and back, and then to process the text there. And I 
> > got through with much less code than the ICU version, while 
> > doing the same thing.
> > 
> > I am unsure of how to proceed. As I see it there are three paths:
> > 1) Use native win32 functionality only on win32
> > 2) Use ICU functionality only on win32
> > 3) Allow both ICU and native functionality, compile time 
> > switch --with-icu (same as unix with the ICU patch)
> > 
> > 
> > The main downsides of ICU vs the native ones are:
> > * ICU does not accept win32 locale names. When doing 
> > setlocale("sv_se"), for example, win32 will return this in 
> > later calls as "Swedish_Sweden.1252". To get around this in 
> > the ICU patch, I had to implement a lookup map that converts 
> > it back to sv_se for ICU.
> > 
> > * ICU is yet another build and runtime dependency, and a 
> > large one (comes in at 11Mb for the DLL files alone in the 
> > win32 download)
> > 
> > 
> > I guess that the main upside of it is that we'd get 
> > constistent behaviour - in case there are issues with either 
> > ICU or win32 native they'd otherwise differ. And only one new 
> > codepath. But we already live with the platform-inconsistency today...
> > 
> > Another upside is that it handles more encodings in ICU - my 
> > native implementation does *only* UTF8 and relies on existing 
> > functionality to deal with other encodings. It could of 
> > course be extended if necessary, but from what I can tell 
> > UTF8 is the big one.
> > 
> > 
> > 
> > I have attached both patches. For the native version, only 
> > win32_utf8.patch is required.  For the ICU version, 
> > icu_win32.patch is needed and also the files 
> > localemap.c,localemap.pl, iso639 and iso3166 needs to go in 
> > src/backend/port/win32. (the localemap needs to be updated to 
> > do a better-than-linear search, but I wanted to include an example)
> > 
> > 
> > Thoughts on the options?
> > 
> > 
> > And anohter question - my native patch touches the same 
> > functions as the ICU patch. Can somebody who knows the 
> > internals confirm or deny that these are all the required 
> > locations, or do we need to modify more?
> > 
> > (I have run simple tests in swedish locale and both behave 
> > the same and correct, but I'm unsure of exactly how much 
> > would be affected)
> > 
> > Finally, the win32 patch also changes the normal path to use 
> > strncoll(). The comment above the function states that we'd 
> > like to use strncoll but it's not available. Well, on win32 
> > it is, so it should provide a speedup on win32. It is 
> > currently not included in the ICU patch, but should probably 
> > be included whichever path we'd chose.
> > 
> > 
> > //Magnus
> > 

Content-Description: win32_utf8.patch

[ Attachment, skipping... ]

Content-Description: icu_win32.patch

[ Attachment, skipping... ]

Content-Description: localemap.pl

[ Attachment, skipping... ]

Content-Description: localemap.c

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

In response to

pgsql-hackers by date

Next:From: Joshua D. DrakeDate: 2005-08-24 16:01:45
Subject: Re: TODO questions
Previous:From: Hannu KrosingDate: 2005-08-24 15:41:20
Subject: Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

pgsql-patches by date

Next:From: Kevin McArthurDate: 2005-08-24 16:58:59
Subject: FreeBSD ICU was Win32 unicode vs ICU
Previous:From: Andrew DunstanDate: 2005-08-24 14:28:31
Subject: Re: PL/Perl regression tests with use_strict

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group