Quick Links

Re: C11: should we use char32_t for unicode code points?

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	pgsql(at)j-davis(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: C11: should we use char32_t for unicode code points?
Date:	2025-10-24 09:43:15
Message-ID:	20251024.184315.1449345234035166124.ishii@postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Now that we're using C11, should we use char32_t for unicode code
> points?
>
> Right now, we use pg_wchar for two purposes:
>
> 1. to abstract away some problems with wchar_t on platforms where
> it's 16 bits; and
> 2. hold unicode code point values
>
> In UTF8, they are are equivalent and can be freely cast back and forth,
> but not necessarily in other encodings. That can be confusing in some
> contexts. Attached is a patch to use char32_t for the second purpose.
>
> Both are equivalent to uint32, so there's no functional change and no
> actual typechecking, it's just for readability.
>
> Is this helpful, or needless code churn?

Unless char32_t is solely used for the Unicode code point data, I
think it would be better to define something like "pg_unicode" and use
it instead of directly using char32_t because it would be cleaner for
code readers.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

C11: should we use char32_t for unicode code points? at 2025-10-23 18:15:54 from Jeff Davis

Responses

Re: C11: should we use char32_t for unicode code points? at 2025-10-24 15:25:27 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2025-10-24 09:53:44	Re: Report bytes and transactions actually sent downtream
Previous Message	Álvaro Herrera	2025-10-24 09:39:15	Re: Question for coverage report