| From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Subject: | Use correct collation in pg_trgm |
| Date: | 2026-01-21 15:36:18 |
| Message-ID: | db087c3e-230e-4119-8a03-8b5d74956bc2@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
In thread [1] we found that pg_trgm always uses DEFAULT_COLLATION_OID
for converting trigrams to lower-case. Here are some examples where
today the collation is ignored:
CREATE EXSTENSION pg_trgm;
CREATE COLLATION turkish (provider = libc, locale = 'tr_TR.utf8');
postgres=# SELECT show_trgm('ISTANBUL' COLLATE "turkish");
show_trgm
---------------------------------------------
{" i"," is",anb,bul,ist,nbu,sta,tan,"ul "}
CREATE TABLE test(col TEXT COLLATE "turkish");
INSERT INTO test VALUES ('ISTANBUL');
postgres=# select show_trgm(col) FROM test;
show_trgm
---------------------------------------------
{" i"," is",anb,bul,ist,nbu,sta,tan,"ul "}
postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
COLLATE "turkish");
similarity
------------
0.5
If the database is initialized via initdb --locale="tr_TR.utf8", the
output changes:
postgres=# SELECT show_trgm('ISTANBUL');
show_trgm
--------------------------------------------------------
{0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}
and
postgres=# select show_trgm(col) FROM test;
show_trgm
--------------------------------------------------------
{0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}
postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
COLLATE "turkish");
similarity
------------
1
tr_TR.utf8 converts capital I to ı which is a multibyte character, while
my default collation converts I to i.
The attached patch attempts to fix that. I grepped for all occurrences
of DEFAULT_COLLATION_OID in contrib/pg_trgm and use the function's
collation OID instead DEFAULT_COLLATION_OID.
The corresponding regression tests pass.
[1]
https://www.postgresql.org/message-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d%40gmail.com
--
David Geier
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Use-correct-collation-in-pg_trgm.patch | text/x-patch | 14.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Geier | 2026-01-21 15:45:06 | Re: Reduce build times of pg_trgm GIN indexes |
| Previous Message | Zsolt Parragi | 2026-01-21 15:22:54 | Re: CREATE TABLE LIKE INCLUDING POLICIES |