Re: foreign_data test fails with non-C locale

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 10:54:02
Message-ID: 200901111254.03722.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday 09 January 2009 18:24:55 Tom Lane wrote:
> I don't think we are prepared to buy into a general policy that the
> regression tests should pass in *any* locale; maintaining a large
> number of variant expected-files isn't very practical. However, the
> de facto policy is that we try to keep them passing in locales that
> are used by any of the regular developers. I think it would be useful
> to have buildfarm members testing in a few common locales.

This called for an extensive test ... :-)

My glibc installation supplies 668 locales (locale -a), which appear to
represent about 225 distinct language/country combinations. (The rest are
encoding variants.)

I ran the regression tests with all of them, and got 95 failures (out of 668).

15 out of the 95 failures are initdb not completing because the encoding
specified by the locale is not supported by PostgreSQL. But it appears that
at least xx_XX.utf8 works for each of these cases, so the language is
supported in some way.

The remaining 80 failures are more-or-less linguistic issues that belong to
the following 26 language/country combinations:

az_AZ sorts k < q < l; Turkish i
br_FR sorts ch separately
crh_UA Turkish i
cs_CZ sorts ch separately; sorts st = s
cy_GB sorts ch separately
da_DK sorts aa = å > z
es_EC sorts ch separately
es_US sorts ch separately
et_EE sorts v = w
fo_FO sorts aa = å > z
ha_NG sorts sh separately
hsb_DE sorts ch separately
ig_NG sorts ch separately; sorts sh separately
ik_CA sorts ch separately
kl_GL sorts aa = å > z
nb_NO sorts aa = å > z
nn_NO sorts aa = å > z
om_ET sorts ch separately (> z); sorts sh separately
om_KE sorts ch separately (> z); sorts sh separately
pl_PL (some other inexplicable sorting regression)
sk_SK sorts ch separately; sorts st = s
sv_SE sorts v = w
tk_TM sorts v = w
tr_CY Turkish i
tr_TR Turkish i
tt_RU sorts k < q < l

The "Turkish i" failures are in the tsearch tests. I'm not completely
comfortable that it's doing the right thing there.

We could easily get rid of the aa, ch, and v/w failures by adjusting the test
data, since the data is completely coincidental anyway. I propose to do
that, and document these issues so that they can be avoided in future tests.

I'm not so worried about the other cases.

Also, considering that some of these alternative sorting rules appear to be
controversial even among users of the language (e.g., we have had actual bug
reports that the es_EC rule is wrong, and the sv_SE rule is also obsolete
according to the language regulators), it might be interesting to write a
small test program that can tell users how their current locale behaves in
known corner cases.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-01-11 11:55:21 Re: Hot standby, slot ids and stuff
Previous Message Gianni Ciolli 2009-01-11 10:44:42 Re: Time to finalize patches for 8.4 beta