Re: The "char" type versus non-ASCII characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: The "char" type versus non-ASCII characters
Date: 2021-12-03 19:42:11
Message-ID: 2320640.1638560531@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/3/21 14:12, Tom Lane wrote:
>> I can think of at least three ways we might address this:
>>
>> * Forbid all non-ASCII values for type "char". This results in
>> simple and portable semantics, but it might break usages that
>> work okay today.
>>
>> * Allow such values only in single-byte server encodings. This
>> is a bit messy, but it wouldn't break any cases that are not
>> problematic already.
>>
>> * Continue to allow non-ASCII values, but change charin/charout,
>> char_text, etc so that the external representation is encoding-safe
>> (perhaps make it an octal or decimal number).

> I don't like #2.

Yeah, it's definitely messy --- for example, maybe é works in
a latin1 database but is rejected when you try to restore into
a DB with utf8 encoding.

> Is #3 going to change the external representation only
> for non-ASCII values? If so, that seems OK.

Right, I envisioned that ASCII behaves the same but we'd use
a numeric representation for high-bit-set values. These
cases could be told apart fairly easily by charin(), since
the numeric representation would always be three digits.

> #1 is the simplest to implement and to understand,
> and I suspect it would break very little in practice, but others might
> disagree with that assessment.

We'd still have to decide what to do with pg_upgrade'd
non-ASCII values, so there's messiness there too.
Having charout() throw an error seems not very nice.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2021-12-03 20:02:24 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Andrew Dunstan 2021-12-03 19:35:03 Re: The "char" type versus non-ASCII characters