Re: Pre-proposal: unicode normalized text

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-10 06:47:31
Message-ID: 205060b0-9e63-4025-93f8-c60ebae42aa7@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06.10.23 19:22, Jeff Davis wrote:
> On Fri, 2023-10-06 at 09:58 +0200, Peter Eisentraut wrote:
>> If you want to be rigid about it, you also need to consider whether
>> the
>> Unicode version used by the ICU library in use matches the one used
>> by
>> the in-core tables.
> What problem are you concerned about here? I thought about it and I
> didn't see an obvious issue.
>
> If the ICU unicode version is ahead of the Postgres unicode version,
> and no unassigned code points are used according to the Postgres
> version, then there's no problem.
>
> And in the other direction, there might be some code points that are
> assigned according to the postgres unicode version but unassigned
> according to the ICU version. But that would be tracked by the
> collation version as you pointed out earlier, so upgrading ICU would be
> like any other ICU upgrade (with the same risks). Right?

It might be alright in this particular combination of circumstances.
But in general if we rely on these tables for correctness (e.g., check
that a string is normalized before passing it to a function that
requires it to be normalized), we would need to consider this. The
correct fix would then probably be to not use our own tables but use
some ICU function to achieve the desired task.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-10-10 07:12:49 Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Previous Message Peter Eisentraut 2023-10-10 06:44:50 Re: Pre-proposal: unicode normalized text