Quick Links

Multi-byte character case-folding

From:	Thom Brown <thom(at)linux(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Multi-byte character case-folding
Date:	2020-07-06 17:35:10
Message-ID:	CAA-aLv5nFfHd72H97u=OnGEsXVn3s-JV-jzMr-HeUePQgX4cEA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

At the moment, only single-byte characters in identifiers are
case-folded, and multi-byte characters are not.

For example, abĉDĚF is case-folded to "abĉdĚf". This can be referred
to as "abĉdĚf" or "ABĉDĚF", but not "abĉděf" or "ABĈDĚF".

downcase_identifier() has the following comment:

/*
* SQL99 specifies Unicode-aware case normalization, which we don't yet
* have the infrastructure for. Instead we use tolower() to provide a
* locale-aware translation. However, there are some locales where this
* is not right either (eg, Turkish may do strange things with 'i' and
* 'I'). Our current compromise is to use tolower() for characters with
* the high bit set, as long as they aren't part of a multi-byte
* character, and use an ASCII-only downcasing for 7-bit characters.
*/

So my question is, do we yet have the infrastructure to make
case-folding consistent across all character widths?

Thanks

Thom

Responses

Re: Multi-byte character case-folding at 2020-07-06 20:33:51 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Dilger	2020-07-06 18:06:16	Re: new heapcheck contrib module
Previous Message	Tom Lane	2020-07-06 16:10:37	Re: Proposal: Automatic partition creation