Re: PATCH: Add uri percent-encoding for binary data

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Anders Åstrand <anders(at)449(dot)se>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Add uri percent-encoding for binary data
Date: 2019-10-07 21:38:15
Message-ID: CAMsGm5dGOiQm8vG=D7vAgMDyFG9U+L+eJOugTN2WhT5PY84DPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 7 Oct 2019 at 03:15, Anders Åstrand <anders(at)449(dot)se> wrote:

> Hello
>
> Attached is a patch for adding uri as an encoding option for
> encode/decode. It uses what's called "percent-encoding" in rfc3986
> (https://tools.ietf.org/html/rfc3986#section-2.1).
>
> The background for this patch is that I could easily build urls in
> plpgsql, but doing the actual encoding of the url parts is painfully
> slow. The list of available encodings for encode/decode looks quite
> arbitrary to me, so I can't see any reason this one couldn't be in
> there.
>
> In modern web scenarios one would probably most likely want to encode
> the utf8 representation of a text string for inclusion in a url, in
> which case correct invocation would be ENCODE(CONVERT_TO('some text in
> database encoding goes here', 'UTF8'), 'uri'), but uri
> percent-encoding can of course also be used for other text encodings
> and arbitrary binary data.
>

This seems like a useful idea to me. I've used the equivalent in Python and
it provides more options:

https://docs.python.org/3/library/urllib.parse.html#url-quoting

I suggest reviewing that documentation there, because there are a few
details that need to be checked carefully. Whether or not space should be
encoded as plus and whether certain byte values should be exempt from
%-encoding is something that depends on the application. Unfortunately, as
far as I can tell there isn't a single version of URL encoding that
satisfies all situations (thus explaining the complexity of the Python
implementation). It might be feasible to suppress some of the Python
options (I'm wondering about the safe= parameter) but I'm pretty sure you
at least need the equivalent of quote and quote_plus.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Smith, Peter 2019-10-07 23:13:28 RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Previous Message Peter Geoghegan 2019-10-07 20:17:53 Re: maintenance_work_mem used by Vacuum