Re: jsonb, unicode escapes and escaped backslashes

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb, unicode escapes and escaped backslashes
Date: 2015-01-30 06:53:01
Message-ID: CAM3SWZQSXAML_3UbA8_sCeGTUD+ZyfL0ubSgtqySi+bk+KcRCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 29, 2015 at 10:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I made the \u0000 error be errcode(ERRCODE_INVALID_TEXT_REPRESENTATION)
> and errmsg("invalid input syntax for type json"), by analogy to what's
> thrown for non-ASCII Unicode escapes in non-UTF8 encoding. I'm not
> terribly happy with that, though. ISTM that for both cases, this is
> not "invalid syntax" at all, but an implementation restriction that
> forces us to reject perfectly valid syntax. So I think we ought to
> use a different ERRCODE and text message, though I'm not entirely
> sure what it should be instead. ERRCODE_FEATURE_NOT_SUPPORTED is
> one possibility.

I personally prefer what you have here.

The point of JSONB is that we take a position on certain aspects like
this. We're bridging a pointedly loosey goosey interchange format,
JSON, with native PostgreSQL types. For example, we take a firm
position on encoding. The JSON type is a bit more permissive, to about
the extent that that's possible. The whole point is that we're
interpreting JSON data in a way that's consistent with *Postgres*
conventions. You'd have to interpret the data according to *some*
convention in order to do something non-trivial with it in any case,
and users usually want that.

It's really nice the way encoding is a strict implementation detail
within Postgres in general, in the sense that you know that if your
application code is concerned about encoding, you're probably thinking
about the problem incorrectly (at least once data has crossed the
database encoding "border"). MySQL's laxidasical attitudes here appear
to have been an enormous mistake.
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-01-30 07:28:41 Re: jsonb, unicode escapes and escaped backslashes
Previous Message Michael Paquier 2015-01-30 06:48:25 Re: PATCH: Reducing lock strength of trigger and foreign key DDL