Quick Links

Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, adam(dot)warland(at)infor(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
Date:	2025-12-02 17:29:06
Message-ID:	6387cb3e-aec8-41a0-acef-bacdbfb435db@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 02/12/2025 18:36, Heikki Linnakangas wrote:
> On 02/12/2025 18:24, Laurenz Albe wrote:
>> On Tue, 2025-12-02 at 10:03 +0000, PG Bug reporting form wrote:
>>> PostgreSQL version: 18.1
>>>
>>> When using a nondeterministic ICU collation, the replace() function
>>> fails to
>>> replace a substring when that substring appears at the end of the input
>>> string.
>>>
>>> Occurrences of the same substring earlier in the string are replaced
>>> normally.
>>>
>>> Specific collation used:
>>> create collation test_nondeterministic (
>>>      provider = icu,
>>>      locale = 'und-u-ks-level2',
>>>      deterministic = false
>>> )
>>>
>>> -- Replace final character under nondeterministic collation
>>> SELECT replace(
>>>      'testx' COLLATE "test_nondeterministic",
>>>      'x'     COLLATE "test_nondeterministic",
>>>      'y') AS res1;
>>
>> I can reproduce the problem, and the attached patch fixes it for me.
>
> +1, looks good to me. Let's also add a regression test for this.

I added a simple test for this, and I think this is still not quite
right. I added the following to collate.icu.utf test:

CREATE TABLE test4nfd (a int, b text);
INSERT INTO test4nfd VALUES (1, 'cote'), (2, 'côte'), (3, 'coté'), (4,
'côté');
UPDATE test4nfd SET b = normalize(b, nfd);
-- This shows why replace should be greedy. Otherwise, in the NFD
-- case, the match would stop before the decomposed accents, which
-- would leave the accents in the results.
SELECT a, b, replace(b COLLATE ignore_accents, 'co', 'ma') FROM test4;
a | b | replace
---+------+---------
1 | cote | mate
2 | côte | mate
3 | coté | maté
4 | côté | maté
(4 rows)

In the added test query, the accents on the 'o' are stripped, which
doesn't look correct.

- Heikki

In response to

Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation at 2025-12-02 16:36:06 from Heikki Linnakangas

Responses

Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation at 2025-12-02 17:45:47 from Laurenz Albe

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Laurenz Albe	2025-12-02 17:45:47	Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
Previous Message	Tom Lane	2025-12-02 17:25:52	Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation