Re: The "char" type versus non-ASCII characters

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: The "char" type versus non-ASCII characters
Date: 2021-12-03 20:11:11
Message-ID: c2f8bd87-daab-9a67-74c6-35970b353508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 12/3/21 14:42, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 12/3/21 14:12, Tom Lane wrote:
>>> I can think of at least three ways we might address this:
>>>
>>> * Forbid all non-ASCII values for type "char". This results in
>>> simple and portable semantics, but it might break usages that
>>> work okay today.
>>>
>>> * Allow such values only in single-byte server encodings. This
>>> is a bit messy, but it wouldn't break any cases that are not
>>> problematic already.
>>>
>>> * Continue to allow non-ASCII values, but change charin/charout,
>>> char_text, etc so that the external representation is encoding-safe
>>> (perhaps make it an octal or decimal number).
>> Is #3 going to change the external representation only
>> for non-ASCII values? If so, that seems OK.
> Right, I envisioned that ASCII behaves the same but we'd use
> a numeric representation for high-bit-set values. These
> cases could be told apart fairly easily by charin(), since
> the numeric representation would always be three digits.

OK, this seems the most attractive. Can we also allow 2 hex digits?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-12-03 20:13:24 Re: The "char" type versus non-ASCII characters
Previous Message Dag Lem 2021-12-03 20:07:29 daitch_mokotoff module