Re: where should I stick that backup?

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: where should I stick that backup?
Date: 2020-04-09 22:44:48
Message-ID: 20200409224448.GC26811@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 9, 2020 at 04:15:07PM -0400, Stephen Frost wrote:
> Greetings,
>
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > I think we need to step back and look at the larger issue. The real
> > argument goes back to the Unix command-line API vs the VMS/Windows API.
> > The former has discrete parts that can be stitched together, while the
> > VMS/Windows API presents a more duplicative but more holistic API for
> > every piece. We have discussed using shell commands for
> > archive_command, and even more recently, for the server pass phrase.
>
> When it comes to something like the server pass phrase, it seems much
> more reasonable to consider using a shell script (though still perhaps
> not ideal) because it's not involved directly in ensuring that the data
> is reliably stored and it's pretty clear that if it doesn't work the
> worst thing that happens is that the database doesn't start up, but it
> won't corrupt any data or destroy it or do other bad things.

Well, the pass phrase relates to security, so it is important too. I
don't think the _importance_ of the action is the most determining
issue. Rather, I think it is how well the action fits the shell script
API.

> > To get more specific, I think we have to understand how the
> > _requirements_ of the job match the shell script API, with stdin,
> > stdout, stderr, return code, and command-line arguments. Looking at
> > archive_command, the command-line arguments allow specification of file
> > names, but quoting can be complex. The error return code and stderr
> > output seem to work fine. There is no clean API for fsync and testing
> > if the file exists, so that all that has to be hand done in one
> > command-line. This is why many users use pre-written archive_command
> > shell scripts.
>
> We aren't considering all of the use-cases really though, in specific,
> things like pushing to s3 or gcs require, at least, good retry logic,
> and that's without starting to think about things like high-rate systems
> (spawning lots of new processes isn't free, particularly if they're
> written in shell script but any interpreted language is expensive) and
> wanting to parallelize.

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

> > This brings up a few questions:
> >
> > * Should we have split apart archive_command into file-exists, copy,
> > fsync-file? Should we add that now?
>
> No.. The right approach to improving on archive command is to add a way
> for an extension to take over that job, maybe with a complete background
> worker of its own, or perhaps a shared library that can be loaded by the
> archiver process, at least if we're talking about how to allow people to
> extend it.

That seems quite vague, which is the issue we had years ago when
considering doing archive_command as a link to a C library.

> Potentially a better answer is to just build this stuff into PG- things
> like "archive WAL to s3/GCS with these credentials" are what an awful
> lot of users want. There's then some who want "archive first to this
> other server, and then archive to s3/GCS", or more complex options.

Yes, we certainly know how to do a file system copy, but what about
copying files to other things like S3? I don't know how we would do
that and allow users to change things like file paths or URLs.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-09 22:52:32 Re: Catalog invalidations vs catalog scans vs ScanPgRelation()
Previous Message Andres Freund 2020-04-09 22:32:49 Re: Catalog invalidations vs catalog scans vs ScanPgRelation()