Re: daitch_mokotoff module

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dag Lem <dag(at)nimrod(dot)no>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: daitch_mokotoff module
Date: 2022-01-03 16:34:36
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dag Lem <dag(at)nimrod(dot)no> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> (We do have methods for dealing with non-ASCII test cases, but
>> I can't see that this patch is using any of them.)

> I naively assumed that tests would be run in an UTF8 environment.

Nope, not necessarily.

Our current best practice for this is to separate out encoding-dependent
test cases into their own test script, and guard the script with an
initial test on database encoding. You can see an example in
and the two associated expected-files. It's a good idea to also cover
as much as you can with pure-ASCII test cases that will run regardless
of the prevailing encoding.

> Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
> two other modules are using UTF8 characters in tests - citext and
> unaccent.

Yeah, neither of those have been upgraded to said best practice.
(If you feel like doing the legwork to improve that situation,
that'd be great.)

> Looking into the unaccent module, I don't quite understand how it will
> work with various encodings, since it doesn't seem to decode its input -
> will it fail if run under anything but ASCII or UTF8?

Its Makefile seems to be forcing the test database to use UTF8.
I think this is a less-than-best-practice choice, because then
we have zero test coverage for other encodings; but it does
prevent test failures.

regards, tom lane

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-03 16:35:31 Re: psql - add SHOW_ALL_RESULTS option
Previous Message Tom Lane 2022-01-03 16:23:56 Re: Remove extra spaces