Re: JSON for PG 9.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-19 20:49:48
Message-ID: CA+TgmoYqhmMJtNp_w0MuFA7kL1J1S9EQDx7bZex8ZX+=SJXEQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Second, what should be do when the database encoding isn't UTF8? I'm
> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
> has a unicode code point - are there any code points in the non-unicode
> encodings that don't have unicode equivalents?). The alternative would be to
> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
> to deal with JSON should be using UTF8 anyway, but we still have to deal
> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
> there we really have no way of telling what it should be. There at least I
> think we should probably error out.

I don't see any reason to escape anything more than the minimum
required by the spec, which only requires it for control characters.
If somebody's got a non-ASCII character in there, we can simply allow
it to be represented by itself. That's almost certainly more compact
(and very possibly more readable) than emitting \uXXXX for each such
instance, and it also matches what the current EXPLAIN (FORMAT JSON)
output does.

In other words, let's decree that when the database encoding isn't
UTF-8, *escaping* of non-ASCII characters doesn't work. But
*unescaped* non-ASCII characters should still work just fine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-01-19 20:54:24 Re: WIP: index support for regexp search
Previous Message Robert Haas 2012-01-19 20:44:30 Re: controlling the location of server-side SSL files