Re: insensitive collations

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: insensitive collations
Date: 2019-01-30 15:30:54
Message-ID: 693ad06d-db9a-4e59-8131-f823483c5893@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:

> Another patch.

+ <literal>ks</literal> key), in order for such such collations to act in
a

s/such such/such/

+ <para>
+ The pattern matching operators of all three kinds do not support
+ nondeterministic collations. If required, apply a different collation
to
+ the expression to work around this limitation.
+ </para>

It's an important point of comparison between CI collations and
contrib/citext, since the latter diverts a bunch of functions/operators
to make them do case-insensitive pattern matching.
The doc for citext explains the rationale for using it versus text,
maybe it would need now to be expanded a bit with pros/cons of
choosing citext versus non-deterministic collations.

The current patch doesn't alter a few string functions that could
potentially implement collation-aware string search, such as
replace(), strpos(), starts_with().
ISTM that we should not let these functions ignore the collation: they
ought to error out until we get their implementation to use the ICU
collation-aware string search.
FWIW I've been experimenting with usearch_openFromCollator() and
other usearch_* functions, and it looks doable to implement at least the
3 above functions based on that, even though the UTF16-ness of the API
does not favor us.

ICU also provides regexp matching, but not collation-aware, since
character-based patterns don't play well with the concept of collation.
About a potential collation-aware LIKE, it looks hard to implement,
since the algorithm currently used in like_match.c seems purely
character-based. AFAICS there's no way to plug calls to usearch_*
functions into it, it would need a separate redesign from scratch.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message leif 2019-01-30 15:53:51 Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Previous Message John Naylor 2019-01-30 14:41:46 Re: WIP: Avoid creation of the free space map for small tables