Re: BUG #15335: Documentation is wrong about archive_command and existing files

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: spam_from_pgsql_lists(at)chezphil(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15335: Documentation is wrong about archive_command and existing files
Date: 2018-08-16 17:01:24
Message-ID: 20180816170124.GP3326@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Greetings,

* PG Bug reporting form (noreply(at)postgresql(dot)org) wrote:
> The docs in section 25.3.1 say that archive_command should check if the
> target file already exists and fail in that case. It seems that this is not
> entirely true; the command should succeed if the target file already exists
> and its content is the same as the source.
> This is explicitly mentioned in section 26.2.9 for the case of cascaded
> replication with a shared archive, but I understand that this is actually
> needed in all cases. I encountered this during a failed attempt at
> promotion, but there are likely to be other cases. Quoting David Steele
> from the -general mailing list:
>
> "Duplicate WAL is possible in *all* cases. A trivial example is that
> Postgres calls archive_command and it succeeds but an error happens (e.g.
> network) right before Postgres is notified. It will wait a bit and try the
> same WAL segment again."

So, I agree with all of the above.

> Note that the example archive commands in the documentation (using cp) get
> this wrong. Minimal examples of archive commands that do this check
> correctly would be very useful.

I agree that the example command in the documentation which uses 'cp'
gets that wrong- it gets quite a few other things wrong too, really.

This is what the problem is though- there really isn't any way to have a
reasonable and *correct* "minimal" example of an archive command. To do
this correctly, you'd really need to compare (or checksum) the file
that's in the archive with the file that's being proposed to be added to
the archive and then say everything is fine if they match, and throw an
error if they don't. Of course, as just discussed elsewhere, you should
really also be checking the system-ID in the WAL files against the
system-ID of the WAL archive to make sure you aren't archiving WAL files
from one system into the archive of another (or across a pg_upgrade),
and further, you should be fsync'ing the WAL file after it's been copied
before telling PG that it's been saved... Sadly, accomplishing all of
that in a one-line shell command is not really feasible.

> (I worry that the non-WAL files that archive_command and restore_command are
> also invoked for, e.g. the .backup and .history files, have some additional
> or possibly even conflicting requirements.)

Not really sure what you're getting at here.

Thanks!

Stephen

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2018-08-16 17:58:35 Re: [PG_UPGRADE] 9.6 to 10.5
Previous Message Martín Marqués 2018-08-16 16:42:35 Re: [PG_UPGRADE] 9.6 to 10.5