Re: Wrong results with equality search using trigram index and non-deterministic collation

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Wrong results with equality search using trigram index and non-deterministic collation
Date: 2026-04-22 06:45:01
Message-ID: cad6d0fa-b81a-4b43-803c-bf1dbbc6385a@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.09.2024 08:00, Laurenz Albe wrote:
> Using a trigram index with an non-deterministic collation can
> lead to wrong query results:
>
> CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r');
>
> CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn);
>
> INSERT INTO boom VALUES (1, 'right'), (2, 'light');
>
> SELECT * FROM boom WHERE t = 'right';
>
> id │ t
> ════╪═══════
> 1 │ right
> 2 │ light
> (2 rows)
>
> CREATE INDEX ON boom USING gin (t gin_trgm_ops);
>
> SET enable_seqscan = off;
>
> SELECT * FROM boom WHERE t = 'right';
>
> id │ t
> ════╪═══════
> 1 │ right
> (1 row)
>
> I also see questionable results with the similarity operator (with and
> without the index):
>
> SELECT * FROM boom WHERE t % 'rigor';
>
> id │ t
> ════╪═══════
> 1 │ right
> (1 row)
>
> But here you could argue that the operator ignores the collation, so
> the result is correct. With equality, there is no such loophole.

I think we should change that. It's very counter intuitive that a query
can change behavior when the planner flips from using e.g. a Seq Scan to
a Bitmap Index Scan or the other way around. There's already a patch for
that, see [1].

> I don't know what the correct fix would be. Perhaps just refusing to use
> the index for equality comparisons with non-deterministic collations.

If we merge [1], then not only = but also LIKE would be incorrect. How
about disabling CREATE INDEX USING gin on columns with non-deterministic
collations?

Or is there maybe a way to make these cases work correctly for
non-deterministic collations by applying the collation when extracting
the search trigrams? I take a look into that.

[1]
https://www.postgresql.org/message-id/flat/db087c3e-230e-4119-8a03-8b5d74956bc2%40gmail.com

--
David Geier

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-22 06:45:29 Re: Fix memory leak in postmasterMain
Previous Message Richard Guo 2026-04-22 06:36:23 Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations