Re: jsonb, unicode escapes and escaped backslashes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb, unicode escapes and escaped backslashes
Date: 2015-01-28 17:48:45
Message-ID: 3593.1422467325@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote:
>> So at this point I propose that we reject \u0000 when de-escaping JSON.

> I would have agreed on 2014-12-09, and this release is the last chance to make
> such a change. It is a bold wager that could pay off, but -1 from me anyway.

You only get to vote -1 if you have a credible alternative. I don't see
one.

> I can already envision the blog post from the DBA staying on 9.4.0 because
> 9.4.1 pulled his ability to store U+0000 in jsonb.

Those will be more or less the same people who bitch about text not
storing NULs; the world has not fallen.

> jsonb was *the* top-billed
> 9.4 feature, and this thread started with Andrew conveying a field report of a
> scenario more obscure than storing U+0000.

There is a separate issue, which is that our earlier attempt to make the
world safe for \u0000 actually broke other things. We still need to fix
that, but I think the fix probably consists of reverting that patch and
instead disallowing \u0000.

>> Anybody who's seriously unhappy with that can propose a patch to fix it
>> properly in 9.5 or later.

> Someone can still do that by introducing a V2 of the jsonb binary format and
> preserving the ability to read both formats. (Too bad Andres's proposal to
> include a format version didn't inform the final format, but we can wing it.)
> I agree that storing U+0000 as 0x00 is the best end state.

We will not need a v2 format, merely values that contain NULs. Existing
data containing \u0000 will be read as those six ASCII characters, which
is not really wrong since that might well be what it was anyway.

>> We probably need to rethink the re-escaping behavior as well; I'm not
>> sure if your latest patch is the right answer for that.

> Yes, we do. No change to the representation of U+0000 is going to fix the
> following bug, but that patch does fix it:

> [local] test=# select
> test-# $$"\\u05e2"$$::jsonb = $$"\\u05e2"$$::jsonb,
> test-# $$"\\u05e2"$$::jsonb = $$"\\u05e2"$$::jsonb::text::jsonb;

The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d,
which I'm inclined to think we need to simply revert, not render even
more squirrely.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-01-28 17:56:21 Re: Sequence Access Method WIP
Previous Message Tom Lane 2015-01-28 17:36:58 Re: jsonb, unicode escapes and escaped backslashes