Re: Pre-proposal: unicode normalized text

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-11 07:37:46
Message-ID: 5661a3b1cd8cf046d6b761c1bcf4eb82cb58397d.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2023-10-11 at 08:56 +0200, Peter Eisentraut wrote:
> On 11.10.23 03:08, Jeff Davis wrote:
> >    * unicode_is_valid(text): returns true if all codepoints are
> > assigned, false otherwise
>
> We need to be careful about precise terminology.  "Valid" has a
> defined
> meaning for Unicode.  A byte sequence can be valid or not as UTF-8. 
> But
> a string containing unassigned code points is not not-"valid" as
> Unicode.

Agreed. Perhaps "unicode_assigned()" is better?

> >    * unicode_version(): version of unicode Postgres is built with
> >    * icu_unicode_version(): version of Unicode ICU is built with
>
> This seems easy enough, but it's not clear what users would actually
> do
> with that.

Just there to make it visible. If it affects the semantics (which it
does currently for normalization) it seems wise to have some way to
access the version.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-10-11 07:43:08 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Mingyu Li 2023-10-11 07:34:27 Re: [PoC] run SQL over ciphertext