Re: Incomplete docs for restore_command for hot standby

From: "Markus Bertheau" <mbertheau(dot)pg(at)googlemail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: Incomplete docs for restore_command for hot standby
Date: 2008-02-25 11:56:01
Message-ID: 684362e10802250356w1fe820f8i2d5207801c18daf0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-patches

2008/2/22, Simon Riggs <simon(at)2ndquadrant(dot)com>:
> On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
> >
> > Section 24.3.3.1 states about restore_command:
> >
> > "The command will be asked for file names that are not present in the
> > archive; it must return nonzero when so asked."
> >
> > Section 24.4.1 further states:
> >
> > "The magic that makes the two loosely coupled servers work together is
> > simply a restore_command used on the standby that waits for the next
> > WAL file to become available from the primary."
> >
> > It is not clear from the first paragraph, whether the non-existing
> > file that restore_command is being asked for is a not-yet-generated
> > WAL file or something different. If it was a not-yet-generated WAL
> > file, restore_command for replication would have to wait for it to
> > appear. If it was something different, restore_command for replication
> > would have to return an error right away. (Because else it would hang
> > indefinitely, waiting for a file that is not going to appear). Yet I
> > couldn't find hints in the documentation as to how these two cases can
> > be detected by restore_command, i.e. how restore_command should tell a
> > request for a WAL file from a request for a non-WAL file.
>
>
> The two sentences aren't mutually exclusive, especially when you
> consider they are discussing two different use cases. Why not read up on
> pg_standby anyway?

I read about pg_standby, but this is not about solving a particular problem but
about missing information in the docs.

> > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
> > shows that this is a problem, and people use unproved heuristics
> > ('history' substring in the requested file name).
>
>
> Old email written during beta. Read at your own peril.

The email may be old, but the problem at hand is still relevant.

> > Additionally, 24.3.3 contains slightly misleading information:
> >
> > "It is important that the command return nonzero exit status on
> > failure. The command will be asked for log files that are not present
> > in the archive; it must return nonzero when so asked. This is not an
> > error condition."
> >
> > This suggests that all non-existing files that restore_command will be
> > asked for are log files. One could therefore reasonably assume that
> > restore_command for replication should wait on all non-existing files.
> > 24.3.3.1 later corrects this by stating that not only log files may be
> > requested, but nevertheless.
>
>
> If you have some suggested changes, I'd be happy to hear them.
>
> Probably additions are better than just changes though.

What about this:

*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'

<para>
It is important that the command return nonzero exit status on failure.
! The command <emphasis>will</> be asked for log files that are not present
! in the archive; it must return nonzero when so asked. This is not an
! error condition. Be aware also that the base name of the <literal>%p</>
! path will be different from <literal>%f</>; do not expect them to be
! interchangeable.
</para>

<para>
--- 1001,1011 ----

<para>
It is important that the command return nonzero exit status on failure.
! The command <emphasis>will</> be asked for log and other files that are
! not present in the archive; it must return nonzero when so asked. This is
! not an error condition. Be aware also that the base name of the
! <literal>%p</> path will be different from <literal>%f</>; do not expect
! them to be interchangeable.
</para>

<para>
***************
*** 1576,1594 **** archive_command = 'local_backup_script.sh'

<para>
The magic that makes the two loosely coupled servers work together is
! simply a <varname>restore_command</> used on the standby that waits
! for the next WAL file to become available from the primary. The
! <varname>restore_command</> is specified in the
<filename>recovery.conf</> file on the standby server. Normal recovery
processing would request a file from the WAL archive, reporting failure
if the file was unavailable. For standby processing it is normal for
! the next file to be unavailable, so we must be patient and wait for
! it to appear. A waiting <varname>restore_command</> can be written as
! a custom script that loops after polling for the existence of the next
! WAL file. There must also be some way to trigger failover, which should
! interrupt the <varname>restore_command</>, break the loop and return
! a file-not-found error to the standby server. This ends recovery and
! the standby will then come up as a normal server.
</para>

<para>
--- 1576,1596 ----

<para>
The magic that makes the two loosely coupled servers work together is
! simply a <varname>restore_command</> used on the standby that, when asked
! for the a WAL file, waits for it to become available from the primary.
! The <varname>restore_command</> is specified in the
<filename>recovery.conf</> file on the standby server. Normal recovery
processing would request a file from the WAL archive, reporting failure
if the file was unavailable. For standby processing it is normal for
! the next WAL file to be unavailable, so we must be patient and wait for
! it to appear. For non-WAL files though the script must still report
! failure. WAL files can be distinguished from non-WAL files by FIXME. A
! waiting <varname>restore_command</> can be written as a custom script that
! loops after polling for the existence of the next WAL file. There must
! also be some way to trigger failover, which should interrupt the
! <varname>restore_command</>, break the loop and return a file-not-found
! error to the standby server. This ends recovery and the standby will then
! come up as a normal server.
</para>

<para>

The FIXME of course needs replacement by someone in the know.

Markus Bertheau
Blog: http://www.bluetwanger.de/blog/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amandeep Singh 2008-02-25 14:28:56 BUG #3988: problem with installation
Previous Message chandra 2008-02-25 11:53:46 BUG #3987: Not checking the password

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2008-02-25 14:05:01 Re: [PATCHES] Avahi support for Postgresql
Previous Message Heikki Linnakangas 2008-02-25 09:40:04 Re: Fix for initdb failures on Vista