Re: proposal: unescape_text function

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: unescape_text function
Date: 2020-12-02 05:48:02
Message-ID: CAFj8pRA-z=k5Ra+UNCXf2RqBCYWyQaDnE89YNgQW-0PQ9Jwxtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

st 2. 12. 2020 v 0:05 odesílatel Andrew Dunstan <andrew(at)dunslane(dot)net>
napsal:

>
> On 11/30/20 8:14 AM, Peter Eisentraut wrote:
> > On 2020-11-29 18:36, Pavel Stehule wrote:
> >>
> >> I don't really get the point of this function. There is AFAICT no
> >> function to produce this escaped format, and it's not a recognized
> >> interchange format. So under what circumstances would one need to
> >> use this?
> >>
> >>
> >> Some corporate data can be in CSV format with escaped unicode
> >> characters. Without this function it is not possible to decode these
> >> files without external application.
> >
> > I would like some supporting documentation on this. So far we only
> > have one stackoverflow question, and then this implementation, and
> > they are not even the same format. My worry is that if there is not
> > precise specification, then people are going to want to add things in
> > the future, and there will be no way to analyze such requests in a
> > principled way.
> >
> >
> >
>
>
> Also, should this be an extension? I'm dubious about including such
> marginal uses in the core code unless there's a really good case for it.
>

I am not sure, and I am inclined so it should be core functionality.

1. Although this use case is marginal, this is related to most used
encodings - ascii and unicode. 8 bit encodings enhanced about escaped
multibyte chars will be used for a very long time. Unfortunately - this
will be worse, because Postgres will be used more in the corporate
environment, where there is a bigger press to conserve very legacy
technologies without correct multibyte support. The core problem so this
issue is out of concept bytea -> text or text -> bytea transformations
supported by Postgres. This is text -> text transformation (for almost all
encoding based on ascii), that is not supported by Postgres now.

2. Postgres already has this functionality - but unfortunately there is a
limit just only literal constants.

create or replace function uunescape(text)
returns text as $$
declare r text;
begin
-- don't use this code!!!
execute 'select e''' || $1 || '''' into r;
return r;
end;
$$ language plpgsql immutable;

But one way how anybody can use it is SQL injection vulnerable and slow. So
some simple buildin solution can be protection against some future security
issues. Personally I am happy with just this limited function that will be
safe (although the design based on introducing new encoding and conversions
can be more complete and accurate). I agree so this case is marginal, but
it is a fully valid use case, and supporting unicode escaped codes just by
parser is a needless limit.

3. there are new disadvantages of extensions in current DBaaS times. Until
the extension is not directly accepted by a cloud provider, then the
extension is not available for users. The acceptance of extensions is not
too agile - so moving this code to extension doesn't solve this problem.
Without DBaaS the implementation of this feature as the extensions can be
good enough.

Regards

Pavel

>
> cheers
>
>
> andrew
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Samokhvalov 2020-12-02 06:08:06 Re: pg_stat_statements oddity with track = all
Previous Message Fujii Masao 2020-12-02 04:52:43 Re: Add statistics to pg_stat_wal view for wal related parameter tuning