Re: making the backend's json parser work in frontend code

From: David Steele <david(at)pgmasters(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making the backend's json parser work in frontend code
Date: 2020-01-24 16:36:58
Message-ID: 12b96994-47c2-c87d-2c9b-710d3e052b3b@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/24/20 9:27 AM, Tom Lane wrote:
> Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
>> On 2020-01-23 18:04, Robert Haas wrote:
>>> Now, you might say "well, why don't we just do an encoding
>>> conversion?", but we can't. When the filesystem tells us what the file
>>> names are, it does not tell us what encoding the person who created
>>> those files had in mind. We don't know that they had*any* encoding in
>>> mind. IIUC, a file in the data directory can have a name that consists
>>> of any sequence of bytes whatsoever, so long as it doesn't contain
>>> prohibited characters like a path separator or \0 byte. But only some
>>> of those possible octet sequences can be stored in a manifest that has
>>> to be valid UTF-8.
>
>> I think it wouldn't be unreasonable to require that file names in the
>> database directory be consistently encoded (as defined by pg_control,
>> probably). After all, this information is sometimes also shown in
>> system views, so it's already difficult to process total junk. In
>> practice, this shouldn't be an onerous requirement.
>
> I don't entirely follow why we're discussing this at all, if the
> requirement is backing up a PG data directory. There are not, and
> are never likely to be, any legitimate files with non-ASCII names
> in that context. Why can't we just skip any such files?

It's not uncommon in my experience for users to drop odd files into
PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree
that it should be discouraged. Even so, I don't recall ever seeing any
non-ASCII filenames.

Skipping files sounds scary, I'd prefer an error or a warning (and then
base64 encode the filename).

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-01-24 16:42:31 Re: Allow to_date() and to_timestamp() to accept localized names
Previous Message David Steele 2020-01-24 16:29:48 Re: making the backend's json parser work in frontend code