Quick Links

Re: daitch_mokotoff module

From:	Dag Lem <dag(at)nimrod(dot)no>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: daitch_mokotoff module
Date:	2022-01-03 13:07:09
Message-ID:	ygeo84tvugy.fsf@sid.nimrod.no
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
>> Erm, it looks like something weird is happening somewhere in cfbot's
>> pipeline, because Dag's patch says:
>
>> +SELECT daitch_mokotoff('Straßburg');
>> + daitch_mokotoff
>> +-----------------
>> + 294795
>> +(1 row)
>
> ... so, that test case is guaranteed to fail in non-UTF8 encodings,
> I suppose? I wonder what the LANG environment is in that cfbot
> instance.
>
> (We do have methods for dealing with non-ASCII test cases, but
> I can't see that this patch is using any of them.)
>
> regards, tom lane
>

I naively assumed that tests would be run in an UTF8 environment.

Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
two other modules are using UTF8 characters in tests - citext and
unaccent.

The citext tests seem to be commented out - "Multibyte sanity
tests. Uncomment to run."

Looking into the unaccent module, I don't quite understand how it will
work with various encodings, since it doesn't seem to decode its input -
will it fail if run under anything but ASCII or UTF8?

In any case, I see that unaccent.sql starts as follows:

CREATE EXTENSION unaccent;

-- must have a UTF8 database
SELECT getdatabaseencoding();

SET client_encoding TO 'UTF8';

Would doing the same thing in fuzzystrmatch.sql fix the problem with
failing tests? Should I prepare a new patch?

Best regards

Dag Lem

In response to

Re: daitch_mokotoff module at 2022-01-03 02:41:53 from Tom Lane

Responses

Re: daitch_mokotoff module at 2022-01-03 16:34:36 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2022-01-03 13:18:35	Re: Add Boolean node
Previous Message	Suraj Kharage	2022-01-03 13:05:55	Remove extra spaces