Re: making the backend's json parser work in frontend code

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making the backend's json parser work in frontend code
Date: 2020-01-23 17:49:58
Message-ID: 20200123174958.GA3138@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2020 at 02:23:14PM -0300, Alvaro Herrera wrote:
> On 2020-Jan-23, Robert Haas wrote:
>
> > No, that's not it. Suppose that Álvaro Herrera has some custom
> > settings he likes to put on all the PostgreSQL clusters that he uses,
> > so he creates a file álvaro.conf and uses an "include" directive in
> > postgresql.conf to suck in those settings. If he also likes UTF-8,
> > then the file name will be stored in the file system as a 12-byte
> > value of which the first two bytes will be 0xc3 0xa1. In that case,
> > everything will be fine, because JSON is supposed to always be UTF-8,
> > and the file name is UTF-8, and it's all good. But suppose he instead
> > likes LATIN-1.
>
> I do have files with Latin-1-encoded names in my filesystem, even though
> my system is UTF-8, so I understand the problem. I was wondering if it
> would work to encode any non-UTF8-valid name using something like
> base64; the encoded name will be plain ASCII and can be put in the
> manifest, probably using a different field of the JSON object -- so for
> a normal file you'd have { path => '1234/2345' } but for a
> Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
> Then it's the job of the tool to ensure it decodes the name to its
> original form when creating/querying for the file.
>
> A problem I have with this idea is that this is very corner-casey, so
> most tool implementors will never realize that there's a need to decode
> certain file names.

Another idea is to use base64 for all non-ASCII file names, so we don't
need to check if the file name is valid UTF8 before outputting --- we
just need to check for non-ASCII, which is much easier. Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-23 17:53:16 Re: Allow to_date() and to_timestamp() to accept localized names
Previous Message Robert Haas 2020-01-23 17:30:36 Re: ssl passphrase callback