Re: where should I stick that backup?

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: where should I stick that backup?
Date: 2020-04-06 17:32:45
Message-ID: CABUevEwgvWTZB0MoFezALHidFWahuczw5sgHxgfWjrPMwtRucQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> Greetings,
>
> * Noah Misch (noah(at)leadboat(dot)com) wrote:
> > On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote:
> > > What I'm thinking about is: suppose we add an option to pg_basebackup
> > > with a name like --pipe-output. This would be mutually exclusive with
> > > -D, but would work at least with -Ft and maybe also with -Fp. The
> > > argument to --pipe-output would be a shell command to be executed once
> > > per output file. Any instance of %f in the shell command would be
> > > replaced with the name of the file that would have been written (and
> > > %% would turn into a single %). The shell command itself would be
> > > executed via system(). So if you want to compress, but using some
> > > other compression program instead of gzip, you could do something
> > > like:
> > >
> > > pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'
> >
> > Seems good to me. I agree -Fp is a "maybe" since the overhead will be high
> > for small files.
>
> For my 2c, at least, introducing more shell commands into critical parts
> of the system is absolutely the wrong direction to go in.
> archive_command continues to be a mess that we refuse to clean up or
> even properly document and the project would be much better off by
> trying to eliminate it rather than add in new ways for users to end up
> with bad or invalid backups.

I think the bigger problem with archive_command more comes from how
it's defined to work tbh. Which leaves a lot of things open.

This sounds to me like a much narrower use-case, which makes it a lot
more OK. But I agree we have to be careful not to get back into that
whole mess. One thing would be to clearly document such things *from
the beginning*, and not try to retrofit it years later like we ended
up doing with archive_command.

And as Robert mentions downthread, the fsync() issue is definitely a
real one, but if that is documented clearly ahead of time, that's a
reasonable level foot-gun I'd say.

> Further, having a generic shell script approach like this would result
> in things like "well, we don't need to actually add support for X, Y or
> Z, because we have this wonderful generic shell script thing and you can
> write your own, and therefore we won't accept patches which do add those
> capabilities because then we'd have to actually maintain that support."

In principle, I agree with "shellscripts suck".

Now, if we were just talking about compression, it would actually be
interesting to implement some sort of "postgres compression API" if
you will, that is implemented by a shared library. This library could
then be used from pg_basebackup or from anything else that needs
compression. And anybody who wants could then do a "<compression X>
for PostgreSQL" module, removing the need for us to carry such code
upstream.

There's been discussions of that for the backend before IIRC, but I
don't recall the conclusions. And in particular, I don't recall if it
included the idea of being able to use it in situations like this as
well, and with *run-time loading*.

And that said, then we'd limit ourselves to compression. We'd still
need a way to deal with encryption...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-04-06 17:38:31 Using the rr debugging tool to debug Postgres
Previous Message Alvaro Herrera 2020-04-06 16:54:56 Re: [HACKERS] Restricting maximum keep segments by repslots