Re: making the backend's json parser work in frontend code

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making the backend's json parser work in frontend code
Date: 2020-01-24 17:14:34
Message-ID: A7971FA1-D8A2-4A0A-BFDD-496FEBF9DE25@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jan 24, 2020, at 8:36 AM, David Steele <david(at)pgmasters(dot)net> wrote:
>
>> I don't entirely follow why we're discussing this at all, if the
>> requirement is backing up a PG data directory. There are not, and
>> are never likely to be, any legitimate files with non-ASCII names
>> in that context. Why can't we just skip any such files?
>
> It's not uncommon in my experience for users to drop odd files into PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree that it should be discouraged. Even so, I don't recall ever seeing any non-ASCII filenames.
>
> Skipping files sounds scary, I'd prefer an error or a warning (and then base64 encode the filename).

I tend to agree with Tom. We know that postgres doesn’t write any such files now, and if we ever decided to change that, we could change this, too. So for now, we can assume any such files are not ours. Either the user manually scribbled in this directory, or had a tool (antivirus checksum file, vim .WHATEVER.swp file, etc) that did so. Raising an error would break any automated backup process that hit this issue, and base64 encoding the file name and backing up the file contents could grab data that the user would not reasonably expect in the backup. But this argument applies equally well to such files regardless of filename encoding. It would be odd to back them up when they happen to be valid UTF-8/ASCII/whatever, but not do so when they are not valid. I would expect, therefore, that we only back up files which match our expected file name pattern and ignore (perhaps with a warning) everything else.

Quoting from Robert’s email about why we want a backup manifest seems to support this idea, at least as I see it:

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing. The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. And maybe a few other bits of metadata, but I'm not sure
> exactly what. Ideas?
>
>
>
> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
>
>
>
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.
> (2) Given an existing backup that has not got a manifest, construct one.
> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.

Nothing in there sounds to me like it needs to include random cruft.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2020-01-24 17:25:11 Re: Allow to_date() and to_timestamp() to accept localized names
Previous Message David Steele 2020-01-24 17:06:39 Re: making the backend's json parser work in frontend code