| From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Cc: | adam(dot)warland(at)infor(dot)com |
| Subject: | BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation |
| Date: | 2025-12-02 10:03:44 |
| Message-ID: | 19341-1d9a22915edfec58@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 19341
Logged by: Adam Warland
Email address: adam(dot)warland(at)infor(dot)com
PostgreSQL version: 18.1
Operating system: Windows 11 Enterprise
Description:
When using a nondeterministic ICU collation, the replace() function fails to
replace a substring when that substring appears at the end of the input
string.
Occurrences of the same substring earlier in the string are replaced
normally.
This appears to be unintended and inconsistent with the documented
limitations of nondeterministic collations. The failure seems specific to
situations where:
• a nondeterministic ICU collation is applied to both source and
match strings, and
• the substring being replaced appears as the final character of the
source string.
The behavior reproduces reliably.
Expected Behavior
replace() should replace all occurrences of the match substring, including
one at the final position, regardless of collation — or, if nondeterministic
collations cannot support this operation, documentation should explicitly
state the limitation (as is already done for LIKE and regular expressions).
Actual Behavior
Under a nondeterministic ICU collation, the final occurrence of the
substring is not replaced, even though earlier occurrences are.
Example output (actual) replace x with y :
res1 | testx -- unchanged, incorrect (final character not replaced)
res2 | yabcdx -- first 'x' replaced, final 'x' not replaced
Under a deterministic or C collation, output is correct:
res_C | testy
Environment
SELECT version();
→ PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 14.2.0-19) 14.2.0, 64-bit.
SHOW server_encoding;
→ UTF8
SHOW lc_collate;
SHOW lc_ctype;
→ en_US.utf8
en_US.utf8
Specific collation used:
create collation test_nondeterministic (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
)
Minimal Reproduction Case
drop COLLATION if EXISTS test_nondeterministic;
create collation test_nondeterministic (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
-- Replace final character under nondeterministic collation
SELECT replace(
'testx' COLLATE "test_nondeterministic",
'x' COLLATE "test_nondeterministic",
'y') AS res1;
-- Replace substring appearing twice — final one fails
SELECT replace(
'xabcdx' COLLATE "test_nondeterministic",
'x' COLLATE "test_nondeterministic",
'y') AS res2;
-- Control test using deterministic collation
SELECT replace(
'testx' COLLATE "C",
'x' COLLATE "C",
'y') AS res_C;
Observed result:
res1 and the final x in res2 are not replaced.
Additional Notes
• The issue does not occur with deterministic ICU collations or with
C collation.
• The failure seems tied specifically to the last character
position, which may indicate an off-by-one issue or a limitation in
substring matching under nondeterministic collation rules.
• No documentation currently states that replace() is partially
unsupported with nondeterministic collations, although other operations
(LIKE, regex) have historically been restricted.
Conclusion
replace() appears to behave incorrectly when matching a substring at the end
of a string under nondeterministic ICU collations. This is either:
• a defect in how nondeterministic collations interact with
substring matching functions, or
an undocumented limitation that should be clarified.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Laurenz Albe | 2025-12-02 11:30:16 | Re: BUG #19340: Wrong result from CORR() function |
| Previous Message | PG Bug reporting form | 2025-12-02 07:57:35 | BUG #19340: Wrong result from CORR() function |