Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rahman Duran <a(dot)rahman(dot)duran(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
Date: 2025-12-30 06:51:40
Message-ID: 3287814.1767077500@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Rahman Duran <a(dot)rahman(dot)duran(at)gmail(dot)com> writes:
> After the release of the PostgreSQL 18 version, I am trying non
> determisinstic collation with LIKE pattern matching support. I am mostly
> searching with "LIKE %search_term%" on about 10 text columns. As I use
> wildcard prefix and suffix, I can't use btree index anyways. So I decided
> to try non deterministic collation support so I can simplify application
> code. I am testing this on a table with ~60K rows. With this row count and
> search pattern, non deterministic collation seems at least 10 times slower
> than LOWER LIKE and ILIKE.

This is not terribly surprising: non-deterministic collations disable
a lot of lower-level optimizations in pattern matching. I think the
particular one that is biting you is probably this bit in
src/backend/utils/adt/like_match.c:

* ... With a nondeterministic collation, we can't
* rely on the first bytes being equal, so we have to recurse in
* any case.

or possibly the later bit

* For nondeterministic locales, we find the next substring of the
* pattern that does not contain wildcards and try to find a
* matching substring in the text. Crucially, we cannot do this
* character by character, as in the normal case, but must do it
* substring by substring, partitioned by the wildcard characters.
* (This is per SQL standard.)

The fundamental problem here is not wanting to make assumptions about
which character strings a non-deterministic collation will consider
equal to which other character strings. If you have concrete ideas
about how to improve that, let's hear them.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rahman Duran 2025-12-30 07:07:22 Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
Previous Message Rahman Duran 2025-12-30 06:30:27 PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance