| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
| Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: C11: should we use char32_t for unicode code points? |
| Date: | 2025-10-28 20:03:33 |
| Message-ID: | CA+hUKGLWggvAW+ZK=P1ZoUBgS8EhodpA7ipeGuq2-3HePjjXDw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Oct 29, 2025 at 7:45 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> On 26.10.25 20:43, Jeff Davis wrote:
> > +/*
> > + * char16_t and char32_t
> > + * Unicode code points.
> > + */
> > +#ifndef __cplusplus
> > +#ifdef HAVE_UCHAR_H
> > +#include <uchar.h>
> > +#ifndef __STDC_UTF_16__
> > +#error "char16_t must use UTF-16 encoding"
> > +#endif
> > +#ifndef __STDC_UTF_32__
> > +#error "char32_t must use UTF-32 encoding"
> > +#endif
> > +#else
> > +typedef uint16_t char16_t;
> > +typedef uint32_t char32_t;
> > +#endif
> > +#endif
>
> This could be improved a bit. The reason for some of these conditionals
> is not clear. Like, what does __cplusplus have to do with this? I
> think it would be more correct to write a configure/meson check for the
> actual types rather than depend indirectly on a header check.
I suggested testing __cplusplus because I predicted that that typedef
would fail on a C++ compiler (since C++11), where char32_t is a
language keyword identifying a distinct type requiring no #include.
This is an Apple-only problem, without which we could just include
<uchar.h> unconditionally, and presumably will eventually when Apple
supplies this non-optional-per-C11 header. On a Mac, #include
<uchar.h> fails for C (there is no $SDK/usr/include/uchar.h) but works
for C++ (it finds $SDK/usr/include/c++/v1/uchar.h), and since we'd
probe for HAVE_UCHAR_H with the C compiler, we'd not find it and thus
also need to exclude __cplusplus at compile time. Otherwise, let's
see what the error looks like...
test.cpp:2:22: error: cannot combine with previous 'int' declaration specifier
2 | typedef unsigned int char32_t;
| ^
test.cpp:2:1: warning: typedef requires a name [-Wmissing-declarations]
2 | typedef unsigned int char32_t;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.
GCC has a clearer message:
test.cpp:2:22: error: redeclaration of C++ built-in type 'char32_t'
[-fpermissive]
2 | typedef unsigned int char32_t;
| ^~~~~~~~
If you try to test for the existence of the type rather than the
header in meson/configure, won't you still have the configure-with-C
compile-with-C++ problem, with no way to resolve it except by keeping
the test for __cplusplus that you're trying to get rid of? So what do
you gain other than more lines of configure stuff?
Out of curiosity, even with -std=C++03 (old C++ standard that might
not work for PostgreSQL for other reasons, but I wanted to see what
would happen with a standard before char32_t became a fundamental
language type) I was surprised to see that the standard library
supplied char32_t. It incorrectly(?) imports a typename from the
future standards using an internal type, so our typedef still fails,
just with a different Clang error:
test.cpp:2:22: error: typedef redefinition with different types
('unsigned int' vs 'char32_t')
2 | typedef unsigned int char32_t;
| ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:320:20:
note: previous definition is here
320 | typedef __char32_t char32_t;
| ^
> The checks for __STDC_UTF_16__ and __STDC_UTF_32__ can be removed, as
> was discussed elsewhere, since we don't use any standard library
> functions that make use of these facts, and the need goes away with C23
> anyway.
+1
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2025-10-28 20:17:08 | Re: apply_scanjoin_target_to_paths and partitionwise join |
| Previous Message | Jeff Davis | 2025-10-28 20:03:17 | Re: C11: should we use char32_t for unicode code points? |