| From: | Jeff Davis <pgsql(at)j-davis(dot)com> | 
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Pre-proposal: unicode normalized text | 
| Date: | 2023-10-03 22:55:32 | 
| Message-ID: | b28354e5b228ef3ec742112e11442486718336af.camel@j-davis.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Mon, 2023-10-02 at 10:47 +0200, Peter Eisentraut wrote:
> I think a better direction here would be to work toward making 
> nondeterministic collations usable on the global/database level and
> then 
> encouraging users to use those.
> 
> It's also not clear which way the performance tradeoffs would fall.
> 
> Nondeterministic collations are obviously going to be slower, but by
> how 
> much?  People have accepted moving from C locale to "real" locales 
> because they needed those semantics.  Would it be any worse moving
> from 
> real locales to "even realer" locales?
If you normalize first, then you can get some semantic improvements
without giving up on the stability and performance of memcmp(). That
seems like a win with zero costs in terms of stability or performance
(except perhaps some extra text->utext casts).
Going to a "real" locale gives more semantic benefits but at a very
high cost: depending on a collation provider library, dealing with
collation changes, and performance costs. While supporting the use of
nondeterministic collations at the database level may be a good idea,
it's not helping to reach the compromise that I'm trying to reach in
this thread.
Regards,
	Jeff Davis
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nico Williams | 2023-10-03 23:01:16 | Re: Pre-proposal: unicode normalized text | 
| Previous Message | Jeff Davis | 2023-10-03 22:34:44 | Re: Pre-proposal: unicode normalized text |