Re: refactoring basebackup.c

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: refactoring basebackup.c
Date: 2022-02-02 15:55:53
Message-ID: CA+TgmoYzYVwkDpwHru5+Lf5f0u4cTN6FeBLsJeHxXTBS7v=11A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 18, 2022 at 1:55 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 0001 adds "server" and "blackhole" as backup targets. It now has some
> tests. This might be more or less ready to ship, unless somebody else
> sees a problem, or I find one.

I played around with this a bit and it seems quite easy to extend this
further. So please find attached a couple more patches to generalize
this mechanism.

0001 adds an extensibility framework for backup targets. The idea is
that an extension loaded via shared_preload_libraries can call
BaseBackupAddTarget() to define a new base backup target, which the
user can then access via pg_basebackup --target TARGET_NAME, or if
they want to pass a detail string, pg_basebackup --target
TARGET_NAME:DETAIL. There might be slightly better ways of hooking
this into the system. I'm not unhappy with this approach, but there
might be a better idea out there.

0002 adds an example contrib module called basebackup_to_shell. The
system administrator can set basebackup_to_shell.command='SOMETHING'.
A backup directed to the 'shell' target will cause the server to
execute the configured command once per generated archive, and once
for the backup_manifest, if any. When executing the command, %f gets
replaced with the archive filename (e.g. base.tar) and %d gets
replaced with the detail. The actual contents of the file are passed
to the command's standard input, and it can then do whatever it likes
with that data. Clearly, this is not state of the art; for instance,
if what you really want is to upload the backup files someplace via
HTTP, using this to run 'curl' is probably not so good of an idea as
using an extension module that links with libcurl. That would likely
lead to better error checking, better performance, nicer
configuration, and just generally fewer things that can go wrong. On
the other hand, writing an integration in C is kind of tricky, and
this thing is quite easy to use -- and it does work.

There are a couple of things to be concerned about with 0002 from a
security perspective. First, in a backend environment, we have a
function to spawn a subprocess via popen(), namely OpenPipeStream(),
but there is no function to spawn a subprocess with execve() and end
up with a socket connected to its standard input. And that means that
whatever command the administrator configures is being interpreted by
the shell, which is a potential problem given that we're interpolating
the target detail string supplied by the user, who must have at least
replication privileges but need not be the superuser. I chose to
handle this by allowing the target detail to contain only alphanumeric
characters. Refinement is likely possible, but whether the effort is
worthwhile seems questionable. Second, what if the superuser wants to
allow the use of this module to only some of the users who have
replication privileges? That seems a bit unlikely but it's possible,
so I added a GUC basebackup_to_shell.required_role. If set, the
functionality is only usable by members of the named role. If unset,
anyone with replication privilege can use it. I guess someone could
criticize this as defaulting to the least secure setting, but
considering that you have to have replication privileges to use this
at all, I don't find that argument much to get excited about.

I have to say that I'm incredibly happy with how easy these patches
were to write. I think this is going to make adding new base backup
targets as accessible as we can realistically hope to make it. There
is some boilerplate code, as an examination of the patches will
reveal, but it's not a lot, and at least IMHO it's pretty
straightforward. Granted, coding up a new base backup target is
something only experienced C hackers are likely to do, but the fact
that I was able to throw this together so quickly suggests to me that
I've got the design basically right, and that anyone who does want to
plug into the new mechanism shouldn't have too much trouble doing so.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Allow-extensions-to-add-new-backup-targets.patch application/octet-stream 17.1 KB
0002-Add-basebackup_to_shell-contrib-module.patch application/octet-stream 13.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-02 15:57:28 Re: Server-side base backup: why superuser, not pg_write_server_files?
Previous Message Alvaro Herrera 2022-02-02 15:48:22 Re: Ensure that STDERR is empty during connect_ok