Re: proposal: unescape_text function

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: unescape_text function
Date: 2020-12-02 14:55:49
Message-ID: 5FC7AAF5.7010209@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/02/20 05:37, Pavel Stehule wrote:
> 2. there can be optional parameter "prefix" with default "\". But with "\u"
> it can be compatible with Java or Python.

Java's unicode escape form is one of those early ones that lack
a six-digit form, and where any character outside of the basic multilingual
plane has to be represented by two four-digit escapes in a row, encoding
the two surrogates that would make up the character's representation
in UTF-16.

Obviously that's an existing form that's out there, so it's not a bad
thing to have some kind of support for it, but it's not a great
representation to encourage people to use.

Python, by contrast, has both \uxxxx and \Uxxxxxxxx where you would use
the latter to represent a non-BMP character directly. So the Java and
Python schemes should be considered distinct.

In Perl, there is a useful extension to regexp substitution where
you specify the replacement not as a string or even a string with &
and \1 \2 ... magic, but as essentially a lambda that is passed the
match and returns a computed replacement. That makes conversions of
the sort discussed here generally trivial to implement. Would it be
worth considering to add something of general utility like that, and
then there could be a small library of pure SQL functions (or a wiki
page or GitHub gist) covering a bunch of the two dozen representations
on that page linked above?

Regards,
-Chap

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2020-12-02 14:59:58 Re: [HACKERS] [PATCH] Generic type subscripting
Previous Message Peter Eisentraut 2020-12-02 14:28:28 macOS SIP, next try