jsonb, unicode escapes and escaped backslashes

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: jsonb, unicode escapes and escaped backslashes
Date: 2015-01-21 23:51:34
Message-ID: 54C03B86.80604@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following case has just been brought to my attention (look at the
differing number of backslashes):

andrew=# select jsonb '"\\u0000"';
jsonb
----------
"\u0000"
(1 row)

andrew=# select jsonb '"\u0000"';
jsonb
----------
"\u0000"
(1 row)

andrew=# select json '"\u0000"';
json
----------
"\u0000"
(1 row)

andrew=# select json '"\\u0000"';
json
-----------
"\\u0000"
(1 row)

The problem is that jsonb uses the parsed, unescaped value of the
string, while json does not. when the string parser sees the input with
the 2 backslashes, it outputs a single backslash, and then it encounters
the remaining chareacters and emits them as is, resulting in a token of
'\u0000'. When it encounters the input with one backslash, it recognizes
a unicode escape, and because it's for u+0000 emits '\u0000'. All other
unicode escapes are resolved, so the only abiguity on input concerns
this case.

Things get worse, though. On output, '\uabcd' for any four hex digits is
recognized as a unicode escape, and thus the backslash is not escaped,
so that we get:

andrew=# select jsonb '"\\uabcd"';
jsonb
----------
"\uabcd"
(1 row)

We could probably fix this fairly easily for non- U+0000 cases by having
jsonb_to_cstring use a different escape_json routine.

But it's a mess, sadly, and I'm not sure what a good fix for the U+0000
case would look like. Maybe we should detect such input and emit a
warning of ambiguity? It's likely to be rare enough, but clearly not as
rare as we'd like, since this is a report from the field.

cheers

andrew

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-01-22 00:05:55 Re: pgaudit - an auditing extension for PostgreSQL
Previous Message Stephen Frost 2015-01-21 23:38:03 Re: pgaudit - an auditing extension for PostgreSQL