Re: JSON for PG 9.2

From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-14 23:11:57
Message-ID: CAARyMpDS_4xcwWPH3XXcxBbOqEmGyc9YCkCXcH9q=pka1PQZYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Second, what should be do when the database encoding isn't UTF8? I'm
> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
> has a unicode code point - are there any code points in the non-unicode
> encodings that don't have unicode equivalents?). The alternative would be to
> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
> to deal with JSON should be using UTF8 anyway, but we still have to deal
> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
> there we really have no way of telling what it should be. There at least I
> think we should probably error out.

I don't think there is a satisfying solution to this problem. Things
working against us:

* Some server encodings support characters that don't map to Unicode
characters (e.g. unused slots in Windows-1252). Thus, converting to
UTF-8 and back is lossy in general.

* We want a normalized representation for comparison. This will
involve a mixture of server and Unicode characters, unless the
encoding is UTF-8.

* We can't efficiently convert individual characters to and from
Unicode with the current API.

* What do we do about \u0000 ? TEXT datums cannot contain NUL characters.

I'd say just ban Unicode escapes and non-ASCII characters unless the
server encoding is UTF-8, and ban all \u0000 escapes. It's easy, and
whatever we support later will be a superset of this.

Strategies for handling this situation have been discussed in prior
emails. This is where things got stuck last time.

- Joey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2012-01-15 00:13:05 Re: Dry-run mode for pg_archivecleanup
Previous Message Thomas Munro 2012-01-14 22:51:54 Re: WIP -- renaming implicit sequences