| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Rahman Duran <a(dot)rahman(dot)duran(at)gmail(dot)com> |
| Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
| Subject: | Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance |
| Date: | 2025-12-30 06:51:40 |
| Message-ID: | 3287814.1767077500@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Rahman Duran <a(dot)rahman(dot)duran(at)gmail(dot)com> writes:
> After the release of the PostgreSQL 18 version, I am trying non
> determisinstic collation with LIKE pattern matching support. I am mostly
> searching with "LIKE %search_term%" on about 10 text columns. As I use
> wildcard prefix and suffix, I can't use btree index anyways. So I decided
> to try non deterministic collation support so I can simplify application
> code. I am testing this on a table with ~60K rows. With this row count and
> search pattern, non deterministic collation seems at least 10 times slower
> than LOWER LIKE and ILIKE.
This is not terribly surprising: non-deterministic collations disable
a lot of lower-level optimizations in pattern matching. I think the
particular one that is biting you is probably this bit in
src/backend/utils/adt/like_match.c:
* ... With a nondeterministic collation, we can't
* rely on the first bytes being equal, so we have to recurse in
* any case.
or possibly the later bit
* For nondeterministic locales, we find the next substring of the
* pattern that does not contain wildcards and try to find a
* matching substring in the text. Crucially, we cannot do this
* character by character, as in the normal case, but must do it
* substring by substring, partitioned by the wildcard characters.
* (This is per SQL standard.)
The fundamental problem here is not wanting to make assumptions about
which character strings a non-deterministic collation will consider
equal to which other character strings. If you have concrete ideas
about how to improve that, let's hear them.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Rahman Duran | 2025-12-30 07:07:22 | Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance |
| Previous Message | Rahman Duran | 2025-12-30 06:30:27 | PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance |