C11: should we use char32_t for unicode code points?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: C11: should we use char32_t for unicode code points?
Date: 2025-10-23 18:15:54
Message-ID: bedcc93d06203dfd89815b10f815ca2de8626e85.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Now that we're using C11, should we use char32_t for unicode code
points?

Right now, we use pg_wchar for two purposes: 

1. to abstract away some problems with wchar_t on platforms where
it's 16 bits; and
2. hold unicode code point values

In UTF8, they are are equivalent and can be freely cast back and forth,
but not necessarily in other encodings. That can be confusing in some
contexts. Attached is a patch to use char32_t for the second purpose.

Both are equivalent to uint32, so there's no functional change and no
actual typechecking, it's just for readability.

Is this helpful, or needless code churn?

Regards,
Jeff Davis

Attachment Content-Type Size
v1-0001-Use-C11-char32_t-for-Unicode-code-points.patch text/x-patch 50.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-10-23 18:22:24 Re: another autovacuum scheduling thread
Previous Message Matheus Alcantara 2025-10-23 18:14:12 Re: Include extension path on pg_available_extensions