Re: Incomplete docs for restore_command for hot standby

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Markus Bertheau <mbertheau(dot)pg(at)googlemail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-bugs(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: Incomplete docs for restore_command for hot standby
Date: 2008-03-04 03:32:56
Message-ID: 200803040332.m243Wvu01187@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Markus Bertheau wrote:
> 2008/2/22, Simon Riggs <simon(at)2ndquadrant(dot)com>:
> > On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
> > >
> > > Section 24.3.3.1 states about restore_command:
> > >
> > > "The command will be asked for file names that are not present in the
> > > archive; it must return nonzero when so asked."
> > >
> > > Section 24.4.1 further states:
> > >
> > > "The magic that makes the two loosely coupled servers work together is
> > > simply a restore_command used on the standby that waits for the next
> > > WAL file to become available from the primary."
> > >
> > > It is not clear from the first paragraph, whether the non-existing
> > > file that restore_command is being asked for is a not-yet-generated
> > > WAL file or something different. If it was a not-yet-generated WAL
> > > file, restore_command for replication would have to wait for it to
> > > appear. If it was something different, restore_command for replication
> > > would have to return an error right away. (Because else it would hang
> > > indefinitely, waiting for a file that is not going to appear). Yet I
> > > couldn't find hints in the documentation as to how these two cases can
> > > be detected by restore_command, i.e. how restore_command should tell a
> > > request for a WAL file from a request for a non-WAL file.
> >
> >
> > The two sentences aren't mutually exclusive, especially when you
> > consider they are discussing two different use cases. Why not read up on
> > pg_standby anyway?
>
> I read about pg_standby, but this is not about solving a particular problem but
> about missing information in the docs.
>
> > > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
> > > shows that this is a problem, and people use unproved heuristics
> > > ('history' substring in the requested file name).
> >
> >
> > Old email written during beta. Read at your own peril.
>
> The email may be old, but the problem at hand is still relevant.
>
> > > Additionally, 24.3.3 contains slightly misleading information:
> > >
> > > "It is important that the command return nonzero exit status on
> > > failure. The command will be asked for log files that are not present
> > > in the archive; it must return nonzero when so asked. This is not an
> > > error condition."
> > >
> > > This suggests that all non-existing files that restore_command will be
> > > asked for are log files. One could therefore reasonably assume that
> > > restore_command for replication should wait on all non-existing files.
> > > 24.3.3.1 later corrects this by stating that not only log files may be
> > > requested, but nevertheless.
> >
> >
> > If you have some suggested changes, I'd be happy to hear them.
> >
> > Probably additions are better than just changes though.
>
> What about this:
>
> *** a/doc/src/sgml/backup.sgml
> --- b/doc/src/sgml/backup.sgml
> ***************
> *** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'
>
> <para>
> It is important that the command return nonzero exit status on failure.
> ! The command <emphasis>will</> be asked for log files that are not present
> ! in the archive; it must return nonzero when so asked. This is not an
> ! error condition. Be aware also that the base name of the <literal>%p</>
> ! path will be different from <literal>%f</>; do not expect them to be
> ! interchangeable.
> </para>
>
> <para>
> --- 1001,1011 ----
>
> <para>
> It is important that the command return nonzero exit status on failure.
> ! The command <emphasis>will</> be asked for log and other files that are
> ! not present in the archive; it must return nonzero when so asked. This is
> ! not an error condition. Be aware also that the base name of the
> ! <literal>%p</> path will be different from <literal>%f</>; do not expect
> ! them to be interchangeable.
> </para>
>
> <para>
> ***************
> *** 1576,1594 **** archive_command = 'local_backup_script.sh'
>
> <para>
> The magic that makes the two loosely coupled servers work together is
> ! simply a <varname>restore_command</> used on the standby that waits
> ! for the next WAL file to become available from the primary. The
> ! <varname>restore_command</> is specified in the
> <filename>recovery.conf</> file on the standby server. Normal recovery
> processing would request a file from the WAL archive, reporting failure
> if the file was unavailable. For standby processing it is normal for
> ! the next file to be unavailable, so we must be patient and wait for
> ! it to appear. A waiting <varname>restore_command</> can be written as
> ! a custom script that loops after polling for the existence of the next
> ! WAL file. There must also be some way to trigger failover, which should
> ! interrupt the <varname>restore_command</>, break the loop and return
> ! a file-not-found error to the standby server. This ends recovery and
> ! the standby will then come up as a normal server.
> </para>
>
> <para>
> --- 1576,1596 ----
>
> <para>
> The magic that makes the two loosely coupled servers work together is
> ! simply a <varname>restore_command</> used on the standby that, when asked
> ! for the a WAL file, waits for it to become available from the primary.
> ! The <varname>restore_command</> is specified in the
> <filename>recovery.conf</> file on the standby server. Normal recovery
> processing would request a file from the WAL archive, reporting failure
> if the file was unavailable. For standby processing it is normal for
> ! the next WAL file to be unavailable, so we must be patient and wait for
> ! it to appear. For non-WAL files though the script must still report
> ! failure. WAL files can be distinguished from non-WAL files by FIXME. A
> ! waiting <varname>restore_command</> can be written as a custom script that
> ! loops after polling for the existence of the next WAL file. There must
> ! also be some way to trigger failover, which should interrupt the
> ! <varname>restore_command</>, break the loop and return a file-not-found
> ! error to the standby server. This ends recovery and the standby will then
> ! come up as a normal server.
> </para>
>
> <para>
>
> The FIXME of course needs replacement by someone in the know.
>
> Markus Bertheau
> Blog: http://www.bluetwanger.de/blog/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael G. Leahy 2008-03-04 07:37:48 Re: BUG #3983: pgxs files missing from binary installation
Previous Message Bruce Momjian 2008-03-04 03:24:33 Re: BUG #3983: pgxs files missing from binary installation

Browse pgsql-patches by date

  From Date Subject
Next Message Bjorn Munch 2008-03-04 11:57:44 libpq.so linking problem on Solaris using --with-gssapi
Previous Message Bruce Momjian 2008-03-04 03:30:26 Re: Fix pgstatindex using for large indexes