Re: different sort order in windows and linux version

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Agent M <agentm(at)themactionfaction(dot)com>
Cc: Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: different sort order in windows and linux version
Date: 2006-07-02 21:26:56
Message-ID: 20060702212656.GC8316@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Sun, Jul 02, 2006 at 12:25:43PM -0400, Agent M wrote:
> On Jul 2, 2006, at 6:13 AM, Martijn van Oosterhout wrote:
> >But I don't think anyone is actually considering importing ICU into the
> >postgres source tree, are they?
> Why not?

Because it's a project of similar size to postgres and probably nearly
as old and I don't think anyone here actually wants to maintain it.

I mean, we could incorporate the source for readline, openssl,
kerberos, the C library but why. That project has maintainers already
and we only wan to use it, not fork it.

> >If you drop the conversion stuff (because postgres already has that)
> >you're down to about 4MB.
> Why would you drop the ICU transcoding support instead of the existing
> postgres functions? Why the duplicated effort?

Because we would want to be bug-for-bug compatable to previous
releases. I suppose it would be possible if someone checked that the
end result is the same.

> Certain Japanese characters cannot make a reliable round-trip through
> Unicode. ICU uses UTF-16 as its store, so the Japanese folks won't be
> happy with an ICU-only solution. However, it would still be of great
> benefit to allow ICU to handle as much as possible, leaving the string
> encodings to the encoding experts.

We don't need round-trip through unicode, since we're only doing one
way conversions for the purpose of collation.

BTW, this site seems to have a good discussion of Japanese characters
and Unicode.

http://www.jbrowse.com/text/unij.html

> At the very least, it would be great to have ICU to handle encoding on
> a per-column basis (perhaps extending the text datatype with encoding
> info). Perhaps this would be a decent stopgap solution? The backend
> protocol would also need a version bump- currently, it converts all
> strings to a single encoding.

That's called SQL COLLATE support and that's an order of magnitude
harder than adding support for ICU. See previous dicussion on -hackers.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-07-02 21:28:25 Re: libpq: bind message supplies 2 parameters, but prepared statement requires 1
Previous Message Alexander Farber 2006-07-02 21:17:12 libpq: bind message supplies 2 parameters, but prepared statement requires 1

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-07-02 22:41:22 odd 7.4 build failure on new sparc machine
Previous Message Tomi NA 2006-07-02 21:14:54 Re: different sort order in windows and linux version