Re: Duplicate history file?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
Subject: Re: Duplicate history file?
Date: 2021-06-16 14:24:17
Message-ID: 20210616142417.GH20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Julien Rouhaud (rjuju123(at)gmail(dot)com) wrote:
> On Wed, Jun 16, 2021 at 9:19 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > This is exactly it. I don't agree that we can, or should, treat every
> > sensible thing that we realize about what the archive command or the
> > backup tool should be doing as some bug in our documentation that has to
> > be backpatched.
> > If you're serious about continuing on this path, it strikes me that the
> > next step would be to go review all of the above mentioned tools,
> > identify all of the things that they do and the checks that they have,
> > and then craft a documentation patch to add all of those- for both
> > archive command and pg_start/stop_backup.
>
> 1) I'm not saying that every single check that every single tools
> currently does is a requirement for a safe command and/or should be
> documented

That's true- you're agreeing that there's even checks beyond those that
are currently implemented which should also be done. That's exactly
what I was responding to.

> 2) I don't think that there are thousands and thousands of
> requirements, as you seem to imply

You've not reviewed any of the tools which have been written and so I'm
not sure what you're basing your belief on. I've done reviews of the
various tools and have been rather involved in the development of one of
them. I do think there's lots of requirements and it's not some static
list which could be just written down once and then never touched or
thought about again.

Consider pg_dump- do we document everything that a logical export tool
should do? That someone who wants to implement pg_dump should make sure
that the tool runs around and takes out a share lock on all of the
tables to be exported? No, of course we don't, because we provide a
tool to do that and if people actually want to understand how it works,
we point them to the source code. Had we started out with a backup tool
in core, the same would be true for that. Instead, we didn't, and such
tools were developed outside of core (and frankly have largely had to
play catch-up to try and figure out all the things that are needed to do
it well and likely always will be since they aren't part of core).

> 3) I still don't understand why you think that having a partial
> knowledge of what makes an archive_command safe scattered in the
> source code of many third party tools is a good thing

Having partial knowledge of what makes an archive_command safe in the
official documentation is somehow better..? What would that lead to-
other people seriously developing a backup solution for PG? No, I
seriously doubt that, as those who are seriously developing such
solutions couldn't trust to only what we've got documented anyway but
would have to go looking through the source code and would need to
develop a deep understanding of how WAL works, what happens when PG is
started up to perform PITR but with archiving disabled and how that
impacts what ends up being archived (hint: the server will switch
timelines but won't actually archive a history file because archiving is
disabled- a restart which then enables archiving will then start pushing
WAL on a timeline where there's no history file; do that twice from an
older backup and not you've got the same WAL files trying to be pushed
into the repo which are actually on materially different timelines even
though the same timeline has been chosen multiple times...), how
timelines work, and all the rest.

We already have partial documentation about what should go into
developing an archive_command and what it's lead to are people ignoring
that and instead copying the example that's explicitly called out as not
sufficient. That's the actual problem that needs to be addressed here.

Let's rip out the example and instead promote tools which have been
written to specifically address this and which are actively maintained.
If someone actually comes asking about how to develop their own backup
solution for PG, we should suggest that they review the PG code related
to WAL, timelines, how promotion works, etc, and probably point them at
the OSS projects which already work to tackle this issue, because to
develop a proper tool you need to actually understand all of that.

> But what better alternative are you suggesting? Say that no ones
> knows what an archive_command should do and let people put a link to
> their backup solution in the hope that they will eventually converge
> to a safe solution that no one will be able to validate?

There are people who do know, today, what an archive command should do
and we should be promoting the tools that they've developed which do, in
fact, implement those checks already, at least the ones we've thought of
so far.

Instead, the suggestion being made here is to write a detailed design
document for how to develop a backup tool (and, no, I don't agree that
we can "just" focus on archive command) for PG and then to maintain it
and update it and backpatch every change to it that we think of.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-06-16 14:41:08 Re: Added schema level support for publication.
Previous Message Paul Guo 2021-06-16 14:23:53 Re: Should wal receiver reply to wal sender more aggressively?