Re: Support LIKE with nondeterministic collations

From: Nico Williams <nico(at)cryptonector(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support LIKE with nondeterministic collations
Date: 2026-05-18 11:51:53
Message-ID: agr9WXfspaNtzY0x@ubby
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 31, 2024 at 03:26:34PM -0700, Jeff Davis wrote:
> On Fri, 2024-05-03 at 16:58 +0200, Daniel Verite wrote:
> > In other words it says that
> >
> >   col LIKE 'smith%' collate "nd"
> >
> > is equivalent to:
> >
> >   col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"
>
> That logic seems to assume something about the collation. If you have a
> collation that orders strings by their sha256 hash, that would entirely
> break the connection between prefixes and ranges, and it wouldn't work.

The hash of what? each character's names or canonical representations in
some UTF? If so, then, to maintain the above equivalence one would have
to alter the definition of this 'hash-based collation' so that U+FFFF is
always "last".

> Is there something about the way collations are defined that inherently
> maintains a connection between a prefix and a range? [...]

Yes: rules like the one Daniel described.

> [...]? Does it remain
> true even when strange rules are added to a collation?

There are 'strange rules' which cannot be used in defining a collation,
as the result would not then be a collation.

Nico
--

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-05-18 11:56:55 Re: Fix SPLIT PARTITION bound-overlap bug and other improvements
Previous Message Álvaro Herrera 2026-05-18 11:49:14 Re: [PATCH] Fix psql tab completion for REPACK boolean options