Re: proposal: unescape_text function

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: unescape_text function
Date: 2020-11-29 17:36:01
Message-ID: CAFj8pRBw-L4gsFLYJMMsLghio=qe9eh+QXWVj037VaCH+h6=eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pá 27. 11. 2020 v 15:37 odesílatel Peter Eisentraut <
peter(dot)eisentraut(at)enterprisedb(dot)com> napsal:

> On 2020-10-07 11:00, Pavel Stehule wrote:
> > Since the idea originated from unescaping unicode string
> > literals i.e.
> > select unescape('Odpov\u011Bdn\u00E1 osoba');
> >
> > Shouldn't the built-in function support the above syntax as well?
> >
> >
> > good idea. The prefixes u (4 digits) and U (8 digits) are supported
>
> I don't really get the point of this function. There is AFAICT no
> function to produce this escaped format, and it's not a recognized
> interchange format. So under what circumstances would one need to use
> this?
>

Some corporate data can be in CSV format with escaped unicode characters.
Without this function it is not possible to decode these files without
external application.

Postgres has support for this conversion, but only for string literals.

CREATE OR REPLACE FUNCTION public.unescape(text, text)
RETURNS text
LANGUAGE plpgsql
AS $function$
DECLARE result text;
BEGIN
EXECUTE format('SELECT U&%s UESCAPE %s',
quote_literal(replace($1, '\u','^')),
quote_literal($2)) INTO result;
RETURN result;
END;
$function$

Because unicode is major encoding, I think this conversion should be
supported. There is another question about implementation like in this
patch implemented unicode_unescape function, or with some new conversion.
Using conversion
https://www.postgresql.org/docs/current/sql-createconversion.html is
probably better, but I am not sure how intuitive it is, and it is hard to
use it (without not nice workarounds) in plpgsql.

I don't expect so Postgres should produce data in unicode escaped format,
but can be useful, if Postgres can do some work with data in special format
of major encoding.

postgres=# select convert_from(E'Odpov\u011Bdn\u00E1 osoba', 'UTF8');
┌─────────────────┐
│ convert_from │
╞═════════════════╡
│ Odpovědná osoba │
└─────────────────┘
(1 row)

I can do this with bytea, but it is hard to use it with text fields.

I didn't find any way how to do it without ugly steps.

Regards

Pavel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2020-11-29 18:40:50 [sqlsmith] Planner error on lateral joins
Previous Message Justin Pryzby 2020-11-29 17:21:15 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)