Re: Duplicate history file?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Duplicate history file?
Date: 2021-06-10 14:00:21
Message-ID: 20210610140020.GR20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Kyotaro Horiguchi (horikyota(dot)ntt(at)gmail(dot)com) wrote:
> At Wed, 09 Jun 2021 16:56:14 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in
> > On 2021/06/09 16:23, Fujii Masao wrote:
> > > Instead, we should consider and document "better" command for
> > > archive_command, or implement something like pg_archivecopy command
> > > into the core (as far as I remember, there was the discussion about
> > > this feature before...)?
> >
> > I agree with that idea.
> > Since archiving is important for all users, I think there should be
> > either a better and safer command in the documentation, or an archive
> > command (pg_archivecopy?) that we provide as a community, as you said.
> > I am curious about the conclusions of past discussions. :)
>
> How perfect the officially-provided script or command need to be? The
> reason that the script in the documentation is so simple is, I guess,
> we don't/can't offer a steps sufficiently solid for all-directions.
>
> Since we didn't noticed that the "test ! -f" harms so it has been
> there but finally we need to remove it. Instead, we need to write
> doen the known significant requirements by words. I'm afraid that the
> concrete script would be a bit complex for the documentation..

We don't have any 'officially-provided' tool for archive command.

> So what we can do that is:
>
> - Remove the "test ! -f" from the sample command (for *nixen).

... or just remove the example entirely. It really doesn't do anything
good for us, in my view.

> - Rewrite at least the following portion in the documentation. [1]
>
> > The archive command should generally be designed to refuse to
> > overwrite any pre-existing archive file. This is an important
> > safety feature to preserve the integrity of your archive in case
> > of administrator error (such as sending the output of two
> > different servers to the same archive directory).
> >
> > It is advisable to test your proposed archive command to ensure
> > that it indeed does not overwrite an existing file, and that it
> > returns nonzero status in this case. The example command above
> > for Unix ensures this by including a separate test step. On some
> > Unix platforms, cp has switches such as -i that can be used to do
> > the same thing less verbosely, but you should not rely on these
> > without verifying that the right exit status is returned. (In
> > particular, GNU cp will return status zero when -i is used and
> > the target file already exists, which is not the desired
> > behavior.)
>
> The replacement would be something like:
>
> "There is a case where WAL file and timeline history files is archived
> more than once. The archive command should generally be designed to
> refuse to replace any pre-existing archive file with a file with
> different content but to return zero if the file to be archived is
> identical with the preexisting file."
>
> But I'm not sure how it looks like.. (even ignoring the broken
> phrasing..)

There is so much more that we should be including here, like "you should
make sure your archive command will reliably sync the WAL file to disk
before returning success to PG, since PG will feel free to immediately
remove the WAL file once archive command has returned successfully", and
"the archive command should check that there exists a .history file for
any timeline after timeline 1 in the repo for the WAL file that's being
archived" and "the archive command should allow the exist, binary
identical, WAL file to be archived multiple times without error, but
should error if a new WAL file is archived which would overwrite a
binary distinct WAL file in the repo", and "the archive command should
check the WAL header to make sure that the WAL file matches the cluster
in the corresponding backup repo", and "whatever is expiring the WAL
files after they've been archived should make sure to not expire out any
WAL that is needed for any of the backups that remain", and "oh, by the
way, depending on the exit code of the command, PG may consider the
failure to be something which can be retried, or not", and other things
that I can't think of off the top of my head right now.

I have to say that it gets to a point where it feels like we're trying
to document everything about writing a C extension to PG using the
hooks which we make available. We've generally agreed that folks should
be looking at the source code if they're writing a serious C extension
and it's certainly the case that, in writing a serious archive command
and backup tool, getting into the PG source code has been routinely
necessary.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-06-10 14:04:47 Re: "an SQL" vs. "a SQL"
Previous Message Tom Lane 2021-06-10 13:57:15 Re: CALL versus procedures with output-only arguments