Re: The "char" type versus non-ASCII characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: The "char" type versus non-ASCII characters
Date: 2021-12-05 19:51:53
Message-ID: 2954046.1638733913@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chapman Flack <chap(at)anastigmatix(dot)net> writes:
> On 12/05/21 12:01, Tom Lane wrote:
>> regression=# select '\'::bytea;
>> ERROR: invalid input syntax for type bytea
>>
>> which would be incompatible with "char"'s existing behavior. But as
>> long as we don't do that, I'd be okay with having high-bit-set char
>> values map to backslash-followed-by-three-octal-digits, which is
>> what bytea escape format would produce.

> Is that a proposal to change nothing about the current treatment
> of values < 128, or just to avoid rejecting bare '\'?

I intended to change nothing about charin's treatment of ASCII
characters, nor anything about bytea's behavior. I don't think
we should relax the error checks in the latter. That does mean
that backslash becomes a problem for the idea of transparent
conversion from char to bytea or vice versa. We could think
about emitting backslash as '\\' in charout, I suppose. I'm
not really convinced though that bytea compatibility is worth
changing a case that's non-problematic today.

> If there's a way to factor out and reuse the good parts of byteain,
> that would mean '\\' would also be accepted to mean a backslash,
> and the \r \n \t usual escapes would be accepted too, and \ooo and
> \xhh.

Uh, what?

regression=# select '\n'::bytea;
ERROR: invalid input syntax for type bytea

But I doubt that sharing code here would be worth the trouble.
The vast majority of byteain is concerned with managing the
string length, which is a nonissue for charin.

> I think it ends up being no more complexity at all, because a single
> octet in bytea-hex form looks like \xhh, which is exactly what
> a single \xhh in bytea-escape form looks like.

I'm confused by this statement too. AFAIK the alternatives in
bytea are \xhh or \ooo:

regression=# select '\xEE'::bytea;
bytea
-------
\xee
(1 row)

regression=# set bytea_output to escape;
SET
regression=# select '\xEE'::bytea;
bytea
-------
\356
(1 row)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-12-05 20:14:48 Re: MSVC SSL test failure
Previous Message Noah Misch 2021-12-05 19:47:55 Re: enable certain TAP tests for MSVC builds