Re: jsonb, unicode escapes and escaped backslashes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb, unicode escapes and escaped backslashes
Date: 2015-01-29 22:21:14
Message-ID: 30821.1422570074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 29, 2015 at 4:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> I'm coming down more and more on the side of Tom's suggestion just to ban
>> \u0000 in jsonb.

> I have yet to understand what we fix by banning \u0000. How is 0000
> different from any other four-digit hexadecimal number that's not a
> valid character in the current encoding? What does banning that one
> particular value do?

As Andrew pointed out upthread, it avoids having to answer the question of
what to return for

select (jsonb '["foo\u0000bar"]')->>0;

or any other construct which is supposed to return an *unescaped* text
representation of some JSON string value.

Right now you get

?column?
--------------
foo\u0000bar
(1 row)

Which is wrong IMO, first because it violates the premise that the output
should be unescaped, and second because this output cannot be
distinguished from the (correct) output of

regression=# select (jsonb '["foo\\u0000bar"]')->>0;
?column?
--------------
foo\u0000bar
(1 row)

There is no way to deliver an output that is not confusable with some
other value's correct output, other than by emitting a genuine \0 byte
which unfortunately we cannot support in a TEXT result.

Potential solutions for this have been mooted upthread, but none
of them look like they're something we can do in the very short run.
So the proposal is to ban \u0000 until such time as we can do something
sane with it.

> In any case, whatever we do about that issue, the idea that the text
> -> json string transformation can *change the input string into some
> other string* seems like an independent problem.

No, it's exactly the same problem, because the reason for that breakage
is an ill-advised attempt to make it safe to include \u0000 in JSONB.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-01-29 22:27:33 Re: Misaligned BufferDescriptors causing major performance problems on AMD
Previous Message Stephen Frost 2015-01-29 22:20:14 Re: [COMMITTERS] pgsql: Fix column-privilege leak in error-message paths