Re: [18] clarify the difference between pg_wchar, wchar_t, and Unicode code points

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [18] clarify the difference between pg_wchar, wchar_t, and Unicode code points
Date: 2024-04-18 19:18:33
Message-ID: 8023d3a1-b7d2-4ae5-8ec2-833d1f86cd32@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.04.24 01:40, Jeff Davis wrote:
> I'm not sure I understand all of the history behind pg_wchar, but it
> seems to be some blend of:
>
> (a) Postgres's own internal representation of a decoded character
> (b) libc's wchar_t
> (c) Unicode code point
>
> For example, Postgres has its own encoding/decoding routines, so (a) is
> the most obvious definition.

(a) is the correct definition, I think. The other ones are just
occasional conveniences, and occasionally wrong.

> When using ICU, we also pass a pg_wchar directly to ICU routines, which
> depends on definition (c), and can lead to problems like:
>
> https://www.postgresql.org/message-id/e7b67d24288f811aebada7c33f9ae629dde0def5.camel@j-davis.com

That's just a plain bug, I think. It's missing the encoding check that
for example pg_strncoll_icu() does.

> The comment at the top of pg_regc_locale.c explains some of the above,
> but not all. I'd like to organize this a bit better:
>
> * a new typedef for a Unicode code point ("codepoint"? "uchar"?)
> * a no-op conversion routine from pg_wchar to a codepoint that would
> assert that the server encoding is UTF-8 (#ifndef FRONTEND, of course)
> * a no-op conversion routine from pg_wchar to wchar_t that would be a
> good place for a comment describing that it's a "best effort" and may
> not be correct in all cases

I guess sometimes you really want to just store an array of Unicode code
points. But I'm not sure how this would actually address coding
mistakes like the one above. You still need to check the server
encoding and do encoding conversion when necessary.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-04-18 19:25:24 Re: Add notes to pg_combinebackup docs
Previous Message Andrew Dunstan 2024-04-18 19:18:32 Re: pgsql: Fix restore of not-null constraints with inheritance