Re: Unicode escapes in literals

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unicode escapes in literals
Date: 2008-10-23 15:04:43
Message-ID: 4900928B.60300@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> SQL has the following escape syntax for it:
>> U&'special character: \xxxx' [ UESCAPE '\' ]
>
> Man that's ugly. Why the ampersand?

Yeah, excellent question. It seems completely unnecessary, but it is
surely there in the syntax diagram.

> How do you propose to distinguish
> this from a perfectly legitimate use of the & operator?

Well, technically, there is going to be some conflict, but the practical
impact should be minimal because:

- There are no spaces allowed between U&' . We typically suggest spaces
around binary operators.

- Naming a column "u" might not be terribly common.

- Binary-and with an undecorated string literal is not very common.

Of course, I have no data for these assertions. An inquiry on -general
might give more insight.

>> 2. Convert this syntax to a function call. But that would then create a
>> lot of inconsistencies, such as needing functional indexes for matches
>> against what should really be a literal.
>
> Uh, why do you think that? The function could surely be stable, even
> immutable if you grant that a database's encoding can't change.

Yeah, true, that would work.

There are some other disadvantages for making a function call. You
couldn't use that kind of literal in any other place where the parser
calls for a string constant: role names, tablespace locations,
passwords, copy delimiters, enum values, function body, file names.

There is also a related feature for Unicode escapes in identifiers, and
it might be good to keep the door open on that.

We could to a dual approach: Convert in the scanner when server encoding
is UTF8, and pass on as function call otherwise. Surely ugly though.

Or pass it on as a separate token type to the analyze phase, but that is
a lot more work.

Others: What use cases do you envision, and what requirements would they
create for this feature?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-10-23 15:08:07 Re: Any reason to have heap_(de)formtuple?
Previous Message Gokulakannan Somasundaram 2008-10-23 15:01:29 A small performance bug in BTree Infrastructure