Re: JSON for PG 9.2

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-19 22:59:03
Message-ID: 4F18A037.1070305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/19/2012 04:12 PM, Robert Haas wrote:
> On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> On 01/19/2012 03:49 PM, Robert Haas wrote:
>>> In other words, let's decree that when the database encoding isn't
>>> UTF-8, *escaping* of non-ASCII characters doesn't work. But
>>> *unescaped* non-ASCII characters should still work just fine.
>> The spec only allows unescaped Unicode chars (and for our purposes that
>> means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
>> result in something that's not legal JSON. See
>> <http://www.ietf.org/rfc/rfc4627.txt?number=4627> section 3.
> I understand. I'm proposing that we not care. In other words, if the
> server encoding is UTF-8, it'll really be JSON. But if the server
> encoding is something else, it'll be almost-JSON. And specifically,
> the \uXXXX syntax won't work, and there might be some non-Unicode
> characters in there. If that's not the behavior you want, then use
> UTF-8.
>
> It seems pretty clear that we're going to have to make some trade-off
> to handle non-UTF8 encodings, and I think what I'm suggesting is a lot
> less painful than disabling high-bit characters altogether. If we do
> that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his
> column label has a non-Unicode character in there? Should we say, oh,
> sorry, you can't explain that in JSON format? That is mighty
> unfriendly, and probably mighty complicated and expensive to figure
> out, too. We *do not support* mixing encodings in the same database,
> and if we make it the job of this patch to fix that problem, we're
> going to be in the same place for 9.2 that we have been for the last
> several releases: nowhere.

OK, then we need to say that very clearly and up front (including in the
EXPLAIN docs.)

Of course, for data going to the client, if the client encoding is UTF8,
they should get legal JSON, regardless of what the database encoding is,
and conversely too, no?

cheers

andrew
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-01-19 23:30:05 Re: WIP -- renaming implicit sequences
Previous Message Greg Smith 2012-01-19 22:39:41 Re: Vacuum rate limit in KBps