Quick Links

Re: is there a deep unyielding reason to limit U&'' literals to ASCII?

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Chapman Flack <chap(at)anastigmatix(dot)net>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: is there a deep unyielding reason to limit U&'' literals to ASCII?
Date:	2016-01-25 17:43:53
Message-ID:	CA+TgmobUp8Q-wcjaKvV=sbDcziJoUUvBCB8m+_xhgOV4DjiA1A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jan 23, 2016 at 11:27 PM, Chapman Flack <chap(at)anastigmatix(dot)net> wrote:
> I see in the documentation (and confirm in practice) that a
> Unicode character string literal U&'...' is only allowed to have
> <Unicode escape value>s representing Unicode characters if the
> server encoding is, exactly and only, UTF8.
>
> Otherwise, it can still have <Unicode escape value>s, but they can only
> be in the range \+000001 to \+00007f and can only represent ASCII characters
> ... and this isn't just for an ASCII server encoding but for _any server
> encoding other than UTF8_.
>
> I'm a newcomer here, so maybe there was an existing long conversation
> where that was determined to be necessary for some deep reason, and I
> just need to be pointed to it.
>
> What I would have expected would be to allow <Unicode escape value>s
> for any Unicode codepoint that's representable in the server encoding,
> whatever encoding that is. Indeed, that's how I read the SQL standard
> (or my scrounged 2006 draft of it, anyway). The standard even lets
> you precede U& with _charsetname and have the escapes be allowed to
> be any character representable in the specified charset. *That*, I assume,
> would be tough to implement in PostgreSQL, since strings don't walk
> around with their own personal charsets attached. But what's the reason
> for not being able to mention characters available in the server encoding?

I don't know anything for sure here, but I wonder if it would make
validating string literals in non-UTF8 encodings significant more
costly. When the encoding is UTF-8, the test as to whether the escape
sequence forms a legal code point doesn't require any table lookups.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

is there a deep unyielding reason to limit U&'' literals to ASCII? at 2016-01-24 04:27:07 from Chapman Flack

Responses

Re: is there a deep unyielding reason to limit U&'' literals to ASCII? at 2016-01-25 17:52:38 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2016-01-25 17:44:45	Re: Patch: ResourceOwner optimization for tables with many partitions
Previous Message	Robert Haas	2016-01-25 17:39:29	Re: Set search_path + server-prepared statements = cached plan must not change result type