Skip site navigation (1) Skip section navigation (2)

Re: foreign_data test fails with non-C locale

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 10:54:02
Message-ID: 200901111254.03722.peter_e@gmx.net (view raw or flat)
Thread:
Lists: pgsql-hackers
On Friday 09 January 2009 18:24:55 Tom Lane wrote:
> I don't think we are prepared to buy into a general policy that the
> regression tests should pass in *any* locale; maintaining a large
> number of variant expected-files isn't very practical.  However, the
> de facto policy is that we try to keep them passing in locales that
> are used by any of the regular developers.  I think it would be useful
> to have buildfarm members testing in a few common locales.

This called for an extensive test ... :-)

My glibc installation supplies 668 locales (locale -a), which appear to 
represent about 225 distinct language/country combinations.  (The rest are 
encoding variants.)

I ran the regression tests with all of them, and got 95 failures (out of 668).

15 out of the 95 failures are initdb not completing because the encoding 
specified by the locale is not supported by PostgreSQL.  But it appears that 
at least xx_XX.utf8 works for each of these cases, so the language is 
supported in some way.

The remaining 80 failures are more-or-less linguistic issues that belong to 
the following 26 language/country combinations:

az_AZ	sorts k < q < l; Turkish i
br_FR	sorts ch separately
crh_UA	Turkish i
cs_CZ	sorts ch separately; sorts st = s
cy_GB	sorts ch separately
da_DK	sorts aa = å > z
es_EC	sorts ch separately
es_US	sorts ch separately
et_EE	sorts v = w
fo_FO	sorts aa = å > z
ha_NG	sorts sh separately
hsb_DE	sorts ch separately
ig_NG	sorts ch separately; sorts sh separately
ik_CA	sorts ch separately
kl_GL	sorts aa = å > z
nb_NO	sorts aa = å > z
nn_NO	sorts aa = å > z
om_ET	sorts ch separately (> z); sorts sh separately
om_KE	sorts ch separately (> z); sorts sh separately
pl_PL	(some other inexplicable sorting regression)
sk_SK	sorts ch separately; sorts st = s
sv_SE	sorts v = w
tk_TM	sorts v = w
tr_CY	Turkish i
tr_TR	Turkish i
tt_RU	sorts k < q < l

The "Turkish i" failures are in the tsearch tests.  I'm not completely 
comfortable that it's doing the right thing there.

We could easily get rid of the aa, ch, and v/w failures by adjusting the test 
data, since the data is completely coincidental anyway.  I propose to do 
that, and document these issues so that they can be avoided in future tests.

I'm not so worried about the other cases.

Also, considering that some of these alternative sorting rules appear to be 
controversial even among users of the language (e.g., we have had actual bug 
reports that the es_EC rule is wrong, and the sv_SE rule is also obsolete 
according to the language regulators), it might be interesting to write a 
small test program that can tell users how their current locale behaves in 
known corner cases.

In response to

Responses

pgsql-hackers by date

Next:From: Simon RiggsDate: 2009-01-11 11:55:21
Subject: Re: Hot standby, slot ids and stuff
Previous:From: Gianni CiolliDate: 2009-01-11 10:44:42
Subject: Re: Time to finalize patches for 8.4 beta

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group