Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Date: 2010-02-12 05:38:32
Message-ID: 3f0b79eb1002112138n61a3258fg9986e50751d44ea0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-docs pgsql-hackers

On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Simon Riggs wrote:
>> Might it not be simpler to add a parameter onto pg_standby?
>> We send %s to tell pg_standby the standby_mode of the server which is
>> calling it so it can decide how to act in each case.
>
> That would work too, but it doesn't seem any simpler to me. On the contrary.

Agreed.

There could be three kinds of SR configurations. Let's think of them separately.

(1) SR without restore_command
(2) SR with 'cp'
(3) SR with pg_standby

(1) is the straightforward configuration. In this case the standby replays only
the WAL files in pg_xlog directory, and starts SR when it has found the invalid
record or been able to find no more WAL file. Then if SR is terminated for some
reasons, the standby would periodically try to connect to the primary and start
SR again. If you choose this, you don't need to care about the problem discussed
on the thread.

In the (2) case the standby replays the WAL files in not only pg_xlog but also
the archive, and starts SR when it has found the invalid record or been able to
find no more WAL file. If the archive is shared between the primary and the
standby, the standby might restore the partial WAL file being archived (copied)
by the primary. This could happen because 'cp' is not an atomic operation.

Currently when the standby finds the WAL file whose file size is less than 16MB,
it emits the FATAL error. This is the problem that I presented upthread. That is
undesirable behavior, so I proposed to just treat that case the same as if no
more WAL file is found. If so, the standby can start SR instead of emitting the
FATAL error. (2) is useful configuration as described in Heikki's
commig message.
http://archives.postgresql.org/pgsql-committers/2010-01/msg00395.php

(3) was unexpected configuration (at least for me). This would work fine as a
*file-based* log shipping but not SR. Since pg_standby doesn't return when no
more WAL file is found in the archive (i.e., it waits until the next complete
WAL file is available), SR will never start. OTOH, since pg_standby treats the
partial file as "nonexistence", the problem discussed on this thread doesn't
happen.

Questions:
(A) Is my proposal for (2) reasonable? For me, Yes.
(B) Should we allow (3) to work as "streaming replication"? In fact, we should
create the new mode that makes pg_standby return as soon as it doesn't find
a complete WAL file in the archive? I agree with Heikki, i.e., don't think
it's worth doing. Though pg_standby already has the capability to remove the
old WAL files, we would still need the cron job that removes them
periodically
because pg_standby is not executed during SR is running normally.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-02-12 07:36:44 pgsql: Check for partial WAL files in standby mode.
Previous Message Andrew Dunstan 2010-02-12 04:33:22 pgsql: Free reference in correct Perl context.

Browse pgsql-docs by date

  From Date Subject
Next Message Heikki Linnakangas 2010-02-12 07:37:40 Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Previous Message Fujii Masao 2010-02-12 01:57:53 Re: Confusing link in streaming replication section

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-02-12 06:19:34 Re: Parameter name standby_mode
Previous Message Robert Haas 2010-02-12 05:22:28 Re: Add on_trusted_init and on_untrusted_init to plperl UPDATED [PATCH]