| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, adam(dot)warland(at)infor(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation |
| Date: | 2025-12-02 17:29:06 |
| Message-ID: | 6387cb3e-aec8-41a0-acef-bacdbfb435db@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On 02/12/2025 18:36, Heikki Linnakangas wrote:
> On 02/12/2025 18:24, Laurenz Albe wrote:
>> On Tue, 2025-12-02 at 10:03 +0000, PG Bug reporting form wrote:
>>> PostgreSQL version: 18.1
>>>
>>> When using a nondeterministic ICU collation, the replace() function
>>> fails to
>>> replace a substring when that substring appears at the end of the input
>>> string.
>>>
>>> Occurrences of the same substring earlier in the string are replaced
>>> normally.
>>>
>>> Specific collation used:
>>> create collation test_nondeterministic (
>>> provider = icu,
>>> locale = 'und-u-ks-level2',
>>> deterministic = false
>>> )
>>>
>>> -- Replace final character under nondeterministic collation
>>> SELECT replace(
>>> 'testx' COLLATE "test_nondeterministic",
>>> 'x' COLLATE "test_nondeterministic",
>>> 'y') AS res1;
>>
>> I can reproduce the problem, and the attached patch fixes it for me.
>
> +1, looks good to me. Let's also add a regression test for this.
I added a simple test for this, and I think this is still not quite
right. I added the following to collate.icu.utf test:
CREATE TABLE test4nfd (a int, b text);
INSERT INTO test4nfd VALUES (1, 'cote'), (2, 'côte'), (3, 'coté'), (4,
'côté');
UPDATE test4nfd SET b = normalize(b, nfd);
-- This shows why replace should be greedy. Otherwise, in the NFD
-- case, the match would stop before the decomposed accents, which
-- would leave the accents in the results.
SELECT a, b, replace(b COLLATE ignore_accents, 'co', 'ma') FROM test4;
a | b | replace
---+------+---------
1 | cote | mate
2 | côte | mate
3 | coté | maté
4 | côté | maté
(4 rows)
SELECT a, b, replace(b COLLATE ignore_accents, 'co', 'ma') FROM test4nfd;
a | b | replace
---+------+---------
1 | cote | mate
2 | côte | mate
3 | coté | maté
4 | côté | maté
(4 rows)
+-- Test for match at the end of the string. (We had a bug on that
+-- once)
+SELECT a, b, replace(b COLLATE ignore_accents, 'te', 'ma') FROM test4nfd;
+ a | b | replace
+---+------+---------
+ 1 | cote | coma
+ 2 | côte | coma
+ 3 | coté | coma
+ 4 | côté | coma
+(4 rows)
+
In the added test query, the accents on the 'o' are stripped, which
doesn't look correct.
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Laurenz Albe | 2025-12-02 17:45:47 | Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation |
| Previous Message | Tom Lane | 2025-12-02 17:25:52 | Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation |