Re: Unicode escapes with any backend encoding

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chapman Flack <chap(at)anastigmatix(dot)net>
Subject: Re: Unicode escapes with any backend encoding
Date: 2020-01-14 02:05:34
Message-ID: 24911.1578967534@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> On Tue, Jan 14, 2020 at 10:02 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Grepping for other direct uses of unicode_to_utf8(), I notice that
>> there are a couple of places in the JSON code where we have a similar
>> restriction that you can only write a Unicode escape in UTF8 server
>> encoding. I'm not sure whether these same semantics could be
>> applied there, so I didn't touch that.

> Off the cuff I'd be inclined to say we should keep the text escape
> rules the same. We've already extended the JSON standard y allowing
> non-UTF8 encodings.

Right. I'm just thinking though that if you can write "é" literally
in a JSON string, even though you're using LATIN1 not UTF8, then why
not allow writing that as "\u00E9" instead? The latter is arguably
truer to spec.

However, if JSONB collapses "\u00E9" to LATIN1 "é", that would be bad,
unless we have a way to undo it on printout. So there might be
some more moving parts here than I thought.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Z 2020-01-14 02:39:15 Re: Making psql error out on output failures
Previous Message Andrew Dunstan 2020-01-14 01:44:16 Re: Unicode escapes with any backend encoding