Quick Links

Re: Pre-proposal: unicode normalized text

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Nico Williams <nico(at)cryptonector(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-03 22:34:44
Message-ID:	a41cfc7c3fcd00ed5d1008cc6fa810340f35fd47.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 2023-10-03 at 15:15 -0500, Nico Williams wrote:
> Ugh, My client is not displying 'a' correctly

Ugh. Is that an argument in favor of normalization or against?

I've also noticed that some fonts render the same character a bit
differently depending on the constituent code points. For instance, if
the accent is its own code point, it seems to be more prominent than if
a single code point represents both the base character and the accent.
That seems to be a violation, but I can understand why that might be
useful.

>
> Almost every Latin input mode out there produces precomposed
> characters
> and so they effectively produce NFC.

The problem is not the normal case, the problem will be things like
obscure input methods, some kind of software that's being too clever,
or some kind of malicious user trying to confuse the database.

>
> That means that indices
> need to normalize strings, but tables need to store unnormalized
> strings.

That's an interesting idea. Would the equality operator normalize
first, or are you saying that the index would need to recheck the
results?

Regards,
Jeff Davis

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-03 20:15:17 from Nico Williams

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-03 23:01:16 from Nico Williams

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2023-10-03 22:55:32	Re: Pre-proposal: unicode normalized text
Previous Message	Pavel Borisov	2023-10-03 21:58:49	Re: Index range search optimization