Re: Duplicate history file?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: sfrost(at)snowman(dot)net
Cc: tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Duplicate history file?
Date: 2021-06-11 02:25:51
Message-ID: 20210611.112551.3268858744197516.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 10 Jun 2021 10:00:21 -0400, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> Greetings,
>
> * Kyotaro Horiguchi (horikyota(dot)ntt(at)gmail(dot)com) wrote:
> > At Wed, 09 Jun 2021 16:56:14 +0900, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> wrote in
> > > On 2021/06/09 16:23, Fujii Masao wrote:
> > > > Instead, we should consider and document "better" command for
> > > > archive_command, or implement something like pg_archivecopy command
> > > > into the core (as far as I remember, there was the discussion about
> > > > this feature before...)?
> > >
> > > I agree with that idea.
> > > Since archiving is important for all users, I think there should be
> > > either a better and safer command in the documentation, or an archive
> > > command (pg_archivecopy?) that we provide as a community, as you said.
> > > I am curious about the conclusions of past discussions. :)
> >
> > How perfect the officially-provided script or command need to be? The
> > reason that the script in the documentation is so simple is, I guess,
> > we don't/can't offer a steps sufficiently solid for all-directions.
> >
> > Since we didn't noticed that the "test ! -f" harms so it has been
> > there but finally we need to remove it. Instead, we need to write
> > doen the known significant requirements by words. I'm afraid that the
> > concrete script would be a bit complex for the documentation..
>
> We don't have any 'officially-provided' tool for archive command.

I meant the "test ! -f .." by the "officially-provided script". The
fact we show it in the documentation (without a caveart) means that
the script at least doesn't break the server behavior that is running
normally including promotion.

> > So what we can do that is:
> >
> > - Remove the "test ! -f" from the sample command (for *nixen).
>
> ... or just remove the example entirely. It really doesn't do anything
> good for us, in my view.

Yeah. I feel like so. But that also means the usage instruction of the
replacements disappears from our documentation. The least problematic
example in the regards above is just "cp .." without "test" as the
instruction.

> > The replacement would be something like:
> >
> > "There is a case where WAL file and timeline history files is archived
> > more than once. The archive command should generally be designed to
> > refuse to replace any pre-existing archive file with a file with
> > different content but to return zero if the file to be archived is
> > identical with the preexisting file."
> >
> > But I'm not sure how it looks like.. (even ignoring the broken
> > phrasing..)
>
> There is so much more that we should be including here, like "you should

Mmm. Yeah, I can understand your sentiment maybe completely.

> make sure your archive command will reliably sync the WAL file to disk
> before returning success to PG, since PG will feel free to immediately
> remove the WAL file once archive command has returned successfully", and
> "the archive command should check that there exists a .history file for
> any timeline after timeline 1 in the repo for the WAL file that's being
> archived" and "the archive command should allow the exist, binary
> identical, WAL file to be archived multiple times without error, but
> should error if a new WAL file is archived which would overwrite a
> binary distinct WAL file in the repo", and "the archive command should
> check the WAL header to make sure that the WAL file matches the cluster
> in the corresponding backup repo", and "whatever is expiring the WAL
> files after they've been archived should make sure to not expire out any
> WAL that is needed for any of the backups that remain", and "oh, by the
> way, depending on the exit code of the command, PG may consider the
> failure to be something which can be retried, or not", and other things
> that I can't think of off the top of my head right now.
> I have to say that it gets to a point where it feels like we're trying
> to document everything about writing a C extension to PG using the
> hooks which we make available. We've generally agreed that folks should
> be looking at the source code if they're writing a serious C extension
> and it's certainly the case that, in writing a serious archive command
> and backup tool, getting into the PG source code has been routinely
> necessary.

Nevertheless I agree to it, still don't we need a minimum workable
setup as the first step? Something like below.

===
The following is an example of the minimal archive_command.

Example: cp %p /blah/%f

However, it is far from perfect. The following is the discussion about
what is needed for archive_command to be more reliable.

<the long list of the requirements>
====

Anyway it doesn't seem to be the time to do that, but as now that we
know that there's a case where the current example doesn't prevent PG
from working correctly, we cannot use the "test ! -f" example and
cannot suggest "do not overwrite existing archived files" without a
caveat. At least don't we need to *fix* that parts for now?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-06-11 02:28:45 Re: Duplicate history file?
Previous Message Peter Geoghegan 2021-06-11 02:15:59 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic