Re: JSON for PG 9.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 00:27:01
Message-ID: CA+TgmobEjwpwWE35MXddntgUYves047AC=zuN7gSVhmc8X4ZFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2012 at 5:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> OK, then we need to say that very clearly and up front (including in the
> EXPLAIN docs.)

Can do.

> Of course, for data going to the client, if the client encoding is UTF8,
> they should get legal JSON, regardless of what the database encoding is, and
> conversely too, no?

Well, that would be nice, but I don't think it's practical. It will
certainly be the case, under the scheme I'm proposing, or probably any
other sensible scheme also, that if a client whose encoding is UTF-8
gets a value of type json back fro the database, it's strictly valid
JSON. But it won't be possible to store every legal JSON value in the
database if the database encoding is anything other than UTF-8, even
if the client encoding is UTF-8. The backend will get the client's
UTF-8 bytes and transcode them to the server encoding before calling
the type-input function, so if there are characters in there that
can't be represented in UTF-8, then we'll error out before the JSON
data type ever gets control. In theory, it would be possible to
accept such strings if the client chooses to represent them using a
\uXXXX sequence, but I'm unexcited about doing the work required to
make that happen, because it will still be a pretty half-baked: we'll
be able to accept some representations of the same JSON constant but
not others.

I think the real fix for this problem is to introduce an
infrastructure inside the database that allows us to have different
columns stored in different encodings. People use bytea for that
right now, but that's pretty unfriendly: it would be nice to have a
better system. However, I expect that to take a lot of work and break
a lot of things, and until we do it I don't feel that compelled to
provide buggy and incomplete support for it under the guise of
implementing a JSON datatype.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2012-01-20 00:29:41 Re: JSON for PG 9.2
Previous Message Alvaro Herrera 2012-01-20 00:15:15 Re: WIP -- renaming implicit sequences