Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date: 2011-07-18 23:36:31
Message-ID: 6D8C16B0-E98C-48B5-899A-6566C5E9A0AD@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul19, 2011, at 00:17 , Joey Adams wrote:
> I suppose a simple solution would be to convert all escapes and
> outright ban escapes of characters not in the database encoding.

+1. Making JSON work like TEXT when it comes to encoding issues
makes this all much simpler conceptually. It also avoids all kinds
of weird issues if you extract textual values from a JSON document
server-side.

If we really need more flexibility than that, we should look at
ways to allow different columns to have different encodings. Doing
that just for JSON seems wrongs - especially because doesn't really
reduce the complexity of the problem, as your examples shows. The
essential problem here is, AFAICS, that there's really no sane way to
compare strings in two different encodings, unless both encode a
subset of unicode only.

> This would have the nice property that all strings can be unescaped
> server-side. Problem is, what if a browser or other program produces,
> say, \u00A0 (NO-BREAK SPACE), and tries to insert it into a database
> where the encoding lacks this character?

They'll get an error - just as if they had tried to store that same
character in a TEXT column.

> On the other hand, converting all JSON to UTF-8 would be simpler to
> implement. It would probably be more intuitive, too, given that the
> JSON RFC says, "JSON text SHALL be encoded in Unicode."

Yet, they only I reason I'm aware of for some people to not use UTF-8
as the server encoding is that it's pretty inefficient storage-wise for
some scripts (AFAIR some japanese scripts are an example, but I don't
remember the details). By making JSON store UTF-8 on-disk always, the
JSON type gets less appealing to those people.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-07-19 02:22:30 Re: storing TZ along timestamps
Previous Message Tom Lane 2011-07-18 23:07:57 Re: patch for 9.2: enhanced errors