Document what is essential and undocumented in pg_basebackup

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, David Steele <david(at)pgmasters(dot)net>
Subject: Document what is essential and undocumented in pg_basebackup
Date: 2022-03-09 19:28:36
Message-ID: 6228FFE4.3050309@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/09/22 12:19, Stephen Frost wrote:
> Let's avoid hijacking [thread about other patch] [1]
> for an independent debate about what our documentation should or
> shouldn't include.

Agreed. New thread here.

Stephen wrote:
> Documenting everything that pg_basebackup does to make sure that the
> backup is viable might be something to work on if someone is really
> excited about this, but it's not 'dead-simple' and it's darn close to
> the bare minimum,

I wrote:
> if the claim is that an admin who relies on pg_basebackup is relying
> on essential things pg_basebackup does that have not been enumerated
> in our documentation yet, I would argue they should be.

Magnus wrote:
> For the people who want to drive their backups from a shellscript and
> for some reason *don't* want to use pg_basebackup, we need to come up
> with a different API or a different set of tools. That is not a
> documentation task. That is a "start from a list of which things
> pg_basebackup cannot do that are still simple, or that tools like
> pgbackrest cannot do if they're complicated". And then design an API
> that's actually safe and easy to use *for that usecase*.

I wrote:
> That might also be a good thing, but I don't see it as a substitute
> for documenting the present reality of what the irreducibly essential
> behaviors of pg_basebackup (or of third-party tools like pgbackrest)
> are, and why they are so.

Stephen wrote:
> I disagree. If we provided a tool then we'd document that tool and how
> users can use it, not every single step that it does (see also:
> pg_basebackup).

I could grant, arguendo, that for most cases where we've "provided a tool"
that's enough, and still distinguish pg_basebackup from those. In no
particular order:

- pg_basebackup comes late to the party. It appears in 9.1 as a tool that
conveniently automates a process (performing an online base backup)
that has already been documented since 8.0 six and a half years earlier.
(While, yes, it streams the file contents over a newly-introduced
protocol, I don't think anyone has called that one of its irreducibly
essential behaviors, or claimed that any other way of reliably copying
those contents during the backup window would be inherently flawed.)

- By the release where pg_basebackup appears, anyone who is doing
online backup and PITR is already using some other tooling (third-party
or locally developed) to do so. There may be benefits and costs in
migrating those procedures to pg_basebackup. If one of the benefits is
"your current procedures may be missing essential steps we built into
pg_basebackup but left out of our documentation" then that is important
to know for an admin who is making that decision. Even better, knowing
what those essential steps are will allow that admin to make an informed
assessment of whether the existing procedures are broken or not.

- Typical tools are easy for an admin to judge the fitness of.
The tool does a thing, and you can tell right away if it did the thing
you needed or not. pg_basebackup, like any backup tool, does a thing,
and you don't find out if that was the thing you needed until later,
when failure isn't an option. That's a less-typical kind of a tool,
for which it's less ok to be a black box.

- Ultimately, an admin's job isn't "use pg_basebackup" (or "use pgbackrest"
or "use barman"). The job is "be certain that this cluster is recoverably
backed up, and for any tool you may be using to do it, that you have the
same grasp of what the tool has done as if you had done it yourself."

In view of all that, I would think it perfectly reasonable to present
pg_basebackup as one convenient and included reference implementation
of the irreducibly essential steps of an online base backup, which we
separately document.

I don't think it is as reasonable to say, effectively, that you learn
what the irreducibly essential steps of an online base backup are by
reading the source of pg_basebackup, and then intuiting which of the
details you find there are the essential ones and which are outgrowths
of its particular design choices.

Regards,
-Chap

[1] https://www.postgresql.org/message-id/20220221172306.GA3698472%40nathanxps13

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2022-03-09 19:32:24 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Previous Message Andres Freund 2022-03-09 18:47:57 Re: [RFC] building postgres with meson