Quick Links

Re: Pre-proposal: unicode normalized text

From:	Peter Eisentraut <peter(at)eisentraut(dot)org>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-02 08:47:48
Message-ID:	33a31c45-2e80-1270-771a-b75f3920a5ec@eisentraut.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 13.09.23 00:47, Jeff Davis wrote:
> The idea is to have a new data type, say "UTEXT", that normalizes the
> input so that it can have an improved notion of equality while still
> using memcmp().

I think a new type like this would obviously be suboptimal because it's
nonstandard and most people wouldn't use it.

I think a better direction here would be to work toward making
nondeterministic collations usable on the global/database level and then
encouraging users to use those.

It's also not clear which way the performance tradeoffs would fall.

Nondeterministic collations are obviously going to be slower, but by how
much? People have accepted moving from C locale to "real" locales
because they needed those semantics. Would it be any worse moving from
real locales to "even realer" locales?

On the other hand, a utext type would either require a large set of its
own functions and operators, or you would have to inject text-to-utext
casts in places, which would also introduce overhead.

In response to

Pre-proposal: unicode normalized text at 2023-09-12 22:47:10 from Jeff Davis

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-02 20:06:09 from Robert Haas
Re: Pre-proposal: unicode normalized text at 2023-10-03 22:55:32 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Drouvot, Bertrand	2023-10-02 08:53:22	Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Previous Message	Heikki Linnakangas	2023-10-02 08:46:48	Re: TAP tests for psql \g piped into program