Re: collate not support Unicode Variation Selector

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: n2029(at)ndensan(dot)co(dot)jp
Cc: thomas(dot)munro(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: collate not support Unicode Variation Selector
Date: 2022-08-05 06:50:32
Message-ID: 20220805.155032.1548634303804827517.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 4 Aug 2022 19:01:33 +0900, 荒井元成 <n2029(at)ndensan(dot)co(dot)jp> wrote in
> Thank you for your reply.
>
> SQLServer supports Unicode Variation Selector, so I would like PostgreSQL to
> support them as well.

I studied the code a bit further, then found that simple comparison
can ignore selectors by using nondeterministic collation.

CREATE COLLATION col1 (provider=icu, locale='ja', deterministic=false);
SELECT (U&'\+003436' || U&'\+0E0101' || U&'\+00304D' collate col1) = U&'\+003436' || U&'\+00304D';
?column?
----------
t

However LIKE dislikes this.

> ERROR: nondeterministic collations are not supported for LIKE

Deterministic collations assume text equality means bytewise
equality. So, the "problem" behavior is correct in a sense. In that
sense, those functions that do not support nondeterministic collations
can be implemented without considering ICU, which leads to the
"problem" behavior. ICU has regular expression function so LIKE might
be ableto be implemented using this. If it is done, and if a
non-deterministic IVS-sensitive collation is available (I didin't find
how to get one..), LIKE would work as you expect.

But..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-08-05 07:02:39 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Peter Smith 2022-08-05 06:03:29 Re: Support logical replication of DDLs