Quick Links

Re: Use correct collation in pg_trgm

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	David Geier <geidav(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: Use correct collation in pg_trgm
Date:	2026-03-24 23:55:46
Message-ID:	2c15502fd399128ee27fbe1a305e006780159f66.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 2026-01-21 at 16:36 +0100, David Geier wrote:
> Hi hackers,
>
> In thread [1] we found that pg_trgm always uses DEFAULT_COLLATION_OID
> for converting trigrams to lower-case. Here are some examples where
> today the collation is ignored:
>
...

> The attached patch attempts to fix that. I grepped for all
> occurrences
> of DEFAULT_COLLATION_OID in contrib/pg_trgm and use the function's
> collation OID instead DEFAULT_COLLATION_OID.

Hi,

Thank you for working on this.

This area is a bit awkward conceptually. The case you found is not
about the *sort order* of the values; it's about the casing semantics.
We mix those two concepts into a single "collation oid" that determines
both sort order and casing semantics (and pattern matching semantics,
too).

LOWER() and UPPER() take the casing semantics from the inferred
collation, so that's a good argument that you're doing the right thing
here.

But full text search does not; it uses DEFAULT_COLLATION_OID for
parsing the input. That sort of makes sense, because tsvector/tsquery
don't have a collatable sort order -- it's more about the parsing
semantics to create the values in the first place, not about how the
tsvector/tsquery values are sorted.

So that leaves me wondering: why would pg_trgm use the inferred
collation and tsvector/tsquery use DEFAULT_COLLATION_OID? They seem
conceptually similar, and the only real difference I see is that
tsvector/tsquery are types and pg_trgm is a set of functions.

Note that I made some changes here recently: full text search and ltree
used to use libc unconditionally or a mix of libc and
DEFAULT_COLLATION_OID; that was clearly wrong and I changed it to
consistently use DEFAULT_COLLATION_OID. But I didn't resolve the
conceptual problem of whether we should use the inferred collation (as
you suggest) or not.

Regards,
Jeff Davis

In response to

Use correct collation in pg_trgm at 2026-01-21 15:36:18 from David Geier

Responses

Re: Use correct collation in pg_trgm at 2026-03-26 08:50:28 from David Geier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Gyan Sreejith	2026-03-25 00:12:59	Re: [Proposal] Adding Log File Capability to pg_createsubscriber
Previous Message	Michael Paquier	2026-03-24 23:46:23	Re: Adding locks statistics