Unicode escapes in literals

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Unicode escapes in literals
Date: 2008-10-23 08:42:03
Message-ID: 490038DB.5070602@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I would like to add an escape mechanism to PostgreSQL for entering
arbitrary Unicode characters into string literals. We currently only
have the option of entering the character directly via the keyboard or
cut-and-paste, which is difficult for a number of reasons, such as when
the font doesn't have the character, and entering the UTF8-encoded bytes
using the E'...' strings, which is hardly usable.

SQL has the following escape syntax for it:

U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint. So this is pretty much
just another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversion
routines, we can't convert the Unicode codepoint to the server encoding
in the scanner stage. I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8. This
would probably cover most use cases anyway. We could have limited
support for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call. But that would then create a
lot of inconsistencies, such as needing functional indexes for matches
against what should really be a literal.

I'd be happy to start with UTF8 support only. Other ideas?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-10-23 08:42:14 Re: Deriving Recovery Snapshots
Previous Message Pavel Stehule 2008-10-23 08:36:28 Re: psql Feature request \set query