Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Date: 2010-03-18 14:27:59
Message-ID: 3f0b79eb1003180727g7877743eq81274e014fe70a49@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-docs pgsql-hackers

On Wed, Mar 17, 2010 at 7:35 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>> I found another missing feature in new file-based log shipping (i.e.,
>> standby_mode is enabled and 'cp' is used as restore_command).
>>
>> After the trigger file is found, the startup process with pg_standby
>> tries to replay all of the WAL files in both pg_xlog and the archive.
>> So, when the primary fails, if the latest WAL file in pg_xlog of the
>> primary can be read, we can prevent the data loss by copying it to
>> pg_xlog of the standby before creating the trigger file.
>>
>> On the other hand, the startup process with standby mode doesn't
>> replay the WAL files in pg_xlog after the trigger file is found. So
>> failover always causes the data loss even if the latest WAL file can
>> be read from the primary. And if the latest WAL file is copied to the
>> archive instead, it can be replayed but a PANIC error would happen
>> because it's not filled.
>>
>> We should remove this restriction?
>
> Looking into this, I realized that we have a bigger problem related to
> this. Although streaming replication stores the streamed WAL files in
> pg_xlog, so that they can be re-replayed after a standby restart without
> connecting to the master, we don't try to replay those either. So if you
> restart standby, it will fail to start up if the WAL it needs can't be
> found in archive or by connecting to the master. That must be fixed.

I agree that this is a bigger problem. Since the standby always starts
walreceiver before replaying any WAL files in pg_xlog, walreceiver tries
to receive the WAL files following the REDO starting point even if they
have already been in pg_xlog. IOW, the same WAL files might be shipped
from the primary to the standby many times. This behavior is unsmart,
and should be addressed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2010-03-18 15:29:45 pgsql: Fix missing parentheses for current_query(), per bug #5378.
Previous Message Peter Eisentraut 2010-03-18 13:23:57 pgsql: Use data-type specific conversion functions also in plpy.execute

Browse pgsql-docs by date

  From Date Subject
Next Message Tim Landscheidt 2010-03-18 15:52:31 [PATCH] Explain generate_subscripts() more clearly
Previous Message Magnus Hagander 2010-03-17 18:04:12 Re: The type of ssl_renegotiation_limit

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-03-18 14:40:32 Re: WIP: shared ispell dictionary
Previous Message Pavel Stehule 2010-03-18 12:06:04 Re: WIP: shared ispell dictionary