Re: jsonb, unicode escapes and escaped backslashes

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb, unicode escapes and escaped backslashes
Date: 2015-01-30 08:16:29
Message-ID: CAM3SWZR7uq+ogmPm1ofGTkCWFRHX1BREAskZdOmSRp1E6N04xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 29, 2015 at 11:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The point of JSONB is that we take a position on certain aspects like
>> this. We're bridging a pointedly loosey goosey interchange format,
>> JSON, with native PostgreSQL types. For example, we take a firm
>> position on encoding. The JSON type is a bit more permissive, to about
>> the extent that that's possible. The whole point is that we're
>> interpreting JSON data in a way that's consistent with *Postgres*
>> conventions. You'd have to interpret the data according to *some*
>> convention in order to do something non-trivial with it in any case,
>> and users usually want that.
>
> I quite agree with you, actually, in terms of that perspective.

Sure, but I wasn't sure that that was evident to others.

To emphasize: I think it's appropriate that the JSON spec takes
somewhat of a back seat approach to things like encoding and the
precision of numbers. I also think it's appropriate that JSONB does
not, up to and including where JSONB forbids things that the JSON spec
supposes could be useful. We haven't failed users by (say) not
accepting NULs, even though the spec suggests that that might be
useful - we have provided them with a reasonable, concrete
interpretation of that JSON data, with lots of useful operators, that
they may take or leave. It really isn't historical that we have both a
JSON and JSONB type. For other examples of this, see every "document
database" in existence.

Depart from this perspective, as an interchange standard author, and
you end up with something like XML, which while easy to reason about
isn't all that useful, or BSON, the binary interchange format, which
is an oxymoron.

> But my point remains: "\u0000" is not invalid JSON syntax, and neither is
> "\u1234". If we choose to throw an error because we can't interpret or
> process that according to our conventions, fine, but we should call it
> something other than "invalid syntax".
>
> ERRCODE_UNTRANSLATABLE_CHARACTER or ERRCODE_CHARACTER_NOT_IN_REPERTOIRE
> seem more apropos from here.

I see. I'd go with ERRCODE_UNTRANSLATABLE_CHARACTER, then.
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2015-01-30 08:27:52 Re: Possible typo in create_policy.sgml
Previous Message Tom Lane 2015-01-30 07:28:41 Re: jsonb, unicode escapes and escaped backslashes