Re: different sort order in windows and linux version

From: "Tomi NA" <hefest(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Dragan Matic" <mlists(at)panforma(dot)co(dot)yu>, pgsql-general(at)postgresql(dot)org
Subject: Re: different sort order in windows and linux version
Date: 2006-06-30 17:29:12
Message-ID: d487eb8e0606301029k7a217d41p45e269d23ad200f2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 6/30/06, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> On Fri, Jun 30, 2006 at 11:56:19AM +0200, Dragan Matic wrote:
> > I have two postgres servers, one on linux (fedora core 5), one on
> > windows, both are version 8.1.4.
> >
> > Both databases are initialized with locale Croatian and win1250 encoding.
> >
> > running pg_controldata on windows returns this
> >
> > LC_COLLATE: Croatian_Croatia.1250
> > LC_CTYPE: Croatian_Croatia.1250
> >
> > the same command on linux returns this
> >
> > LC_COLLATE: hr_HR
> > LC_CTYPE: hr_HR
> >
> > which is the same, I suppose.
>
> Well, apparently not. Postgres makes no attempt to understand
> collations nor try to determine whether they make sense. If you want to
> have the same collation on Windows and Linux, I think you're going to
> have trouble.

Croatian_Croatia and hr_HR are, in fact, the same in that there is no
other collation for the Croatian language. Whatsmore, Dragan ran the
test using characters which are encoded exactly the same in cp1250,
utf8, iso8859-2, hell, probably even us-ascii. The fact remains that
different OSes collate differently, even for the same locale.

In C++, people use things like GTK, wxWidgets and GCL so that they
could think about "C++ code instead of the platform they're coding on.
In Java, people use things like File.separator instead of "\" or "/"
so that they could think about "Java code".
There are dozens of examples like these and most of the exceptions
stem from the influence of the at the time monopoly-holder.
When you code in the RDBMS environment, you want to code in terms of
pgsql or Oracle or MySQL or whatever: you don't want to program for
Oracle on Solaris vs. Oracle on Linux vs. Oracle on Plan9 or...well,
you get the idea.
Not beeing able to depend on the engine to consistently collate
strings as simple as the ones Dragan listed is closer to a serious bug
(non-deterministic behaviour in otherwise deterministic functions)
than a RFE, but is certainly nowhere near "it's not our problem" as it
regularly seems made up to be. The OS(es) simply and obviously
do(es)n't do a good enough job of it.

> In the past there have existed patches to allow postgres to use ICU for
> locale support. It's supposedly not quite as fast, but you will be able
> get consistant results across platforms.

Personally, I'd be perfectly happy with pgsql if I could choose to
make text operations up to 2-3x slower without the fuss of how it's
going to work on a certain platform, in each pgsql version.
Furthermore, compiling the server myself is not an option for live
usage: on my current project, I'm not even the one installing the
database servers...sending administrators a binary I configured and
compiled (on Windows, in this case!) and noone but me
tested...brrrr...I get the shivers just thinking about it.

If I sound harsh, please excuse me, but I feel like I'm the only one
who thinks these encoding problems (collation, upper/lowercase,
multiple languages in a single database) are serious...nobody seems to
share the sentiment. Ah well...

t.n.a.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Fetter 2006-06-30 17:29:39 Re: Notes on converting from MySQL 5.0.x to PostgreSQL
Previous Message Merlin Moncure 2006-06-30 17:07:32 Re: pgsql vs mysql

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-06-30 17:29:38 Re: optimizing constant quals within outer joins
Previous Message mark 2006-06-30 16:39:52 Re: Fixed length datatypes. WAS [GENERAL] UUID's as