Use correct collation in pg_trgm

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Use correct collation in pg_trgm
Date: 2026-01-21 15:36:18
Message-ID: db087c3e-230e-4119-8a03-8b5d74956bc2@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

In thread [1] we found that pg_trgm always uses DEFAULT_COLLATION_OID
for converting trigrams to lower-case. Here are some examples where
today the collation is ignored:

CREATE EXSTENSION pg_trgm;
CREATE COLLATION turkish (provider = libc, locale = 'tr_TR.utf8');

postgres=# SELECT show_trgm('ISTANBUL' COLLATE "turkish");
show_trgm
---------------------------------------------
{" i"," is",anb,bul,ist,nbu,sta,tan,"ul "}

CREATE TABLE test(col TEXT COLLATE "turkish");
INSERT INTO test VALUES ('ISTANBUL');

postgres=# select show_trgm(col) FROM test;
show_trgm
---------------------------------------------
{" i"," is",anb,bul,ist,nbu,sta,tan,"ul "}

postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
COLLATE "turkish");
similarity
------------
0.5

If the database is initialized via initdb --locale="tr_TR.utf8", the
output changes:

postgres=# SELECT show_trgm('ISTANBUL');
show_trgm
--------------------------------------------------------
{0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}

and

postgres=# select show_trgm(col) FROM test;
show_trgm
--------------------------------------------------------
{0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}

postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
COLLATE "turkish");
similarity
------------
1

tr_TR.utf8 converts capital I to ı which is a multibyte character, while
my default collation converts I to i.

The attached patch attempts to fix that. I grepped for all occurrences
of DEFAULT_COLLATION_OID in contrib/pg_trgm and use the function's
collation OID instead DEFAULT_COLLATION_OID.

The corresponding regression tests pass.

[1]
https://www.postgresql.org/message-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d%40gmail.com

--
David Geier

Attachment Content-Type Size
v1-0001-Use-correct-collation-in-pg_trgm.patch text/x-patch 14.8 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2026-01-21 15:45:06 Re: Reduce build times of pg_trgm GIN indexes
Previous Message Zsolt Parragi 2026-01-21 15:22:54 Re: CREATE TABLE LIKE INCLUDING POLICIES