Re: JSON for PG 9.2

From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 20:47:05
Message-ID: CAARyMpAW9N+6_kcob-R=DMP3HyvXXduA3kAecsocLPNZq04CyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 1:29 PM, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:
> At 2012-01-31 12:04:31 -0500, robertmhaas(at)gmail(dot)com wrote:
>>
>> That fails to answer the question of what we ought to do if we get an
>> invalid sequence there.
>
> I think it's best to categorically reject invalid surrogates as early as
> possible, considering the number of bugs that are related to them (not
> in Postgres, just in general). I can't see anything good coming from
> letting them in and leaving them to surprise someone in future.
>
> -- ams

+1

Another sequence to beware of is \u0000. While escaped NUL characters
are perfectly valid in JSON, NUL characters aren't allowed in TEXT
values. This means not all JSON strings can be converted to TEXT,
even in UTF-8. This may also complicate collation, if comparison
functions demand null-terminated strings.

I'm mostly in favor of allowing \u0000. Banning \u0000 means users
can't use JSON strings to marshal binary blobs, e.g. by escaping
non-printable characters and only using U+0000..U+00FF. Instead, they
have to use base64 or similar.

Banning \u0000 doesn't quite violate the RFC:

An implementation may set limits on the length and character
contents of strings.

-Joey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-01-31 20:55:16 Re: [v9.2] Add GUC sepgsql.client_label
Previous Message Andrew Dunstan 2012-01-31 20:23:02 Re: [GENERAL] pg_dump -s dumps data?!