plpython_unicode test (was Re: buildfarm / handling (undefined) locales)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: plpython_unicode test (was Re: buildfarm / handling (undefined) locales)
Date: 2014-06-01 20:45:17
Message-ID: 6789.1401655517@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> On 13.5.2014 20:58, Tom Lane wrote:
>> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>>> Yeah, not really what we were shooting for. I've fixed this by
>>> defining the missing locales, and indeed - magpie now fails in
>>> plpython tests.

>> I saw that earlier today (tho right now the buildfarm server seems
>> to not be responding :-(). Probably we should use some
>> more-widely-used character code in that specific test?

> Any idea what other character could be used in those tests? ISTM fixing
> this universally would mean using ASCII characters - the subset of UTF-8
> common to all the encodings. But I'm afraid that'd contradict the very
> purpose of those tests ...

We really ought to resolve this issue so that we can get rid of some of
the red in the buildfarm. ISTM there are three possible approaches:

1. Decide that we're not going to support running the plpython regression
tests under "weird" server encodings, in which case Tomas should just
remove cs_CZ.WIN-1250 from the set of encodings his buildfarm animals
test. Don't much care for this, but it has the attraction of being
minimal work.

2. Change the plpython_unicode test to use some ASCII character in place
of \u0080. We could keep on using the \u syntax to create the character,
but as stated above, this still seems like it's losing a significant
amount of test coverage.

3. Try to select some "more portable" non-ASCII character, perhaps U+00A0
(non breaking space) or U+00E1 (a-acute). I think this would probably
work for most encodings but it might still fail in the Far East. Another
objection is that the expected/plpython_unicode.out file would contain
that character in UTF8 form. In principle that would work, since the test
sets client_encoding = utf8 explicitly, but I'm worried about accidental
corruption of the expected file by text editors, file transfers, etc.
(The current usage of U+0080 doesn't suffer from this risk because psql
special-cases printing of multibyte UTF8 control characters, so that we
get exactly "\u0080".)

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-01 21:35:53 Re: plpython_unicode test (was Re: buildfarm / handling (undefined) locales)
Previous Message Maxence Ahlouche 2014-06-01 20:06:54 Re: [GSoC] Clustering in MADlib - status update