Re: Unnecessary WAL archiving after failover

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unnecessary WAL archiving after failover
Date: 2012-06-05 21:12:37
Message-ID: CA+U5nMKYR_d47SVvZgVpMyrAqyCPBo7u-aNtSN5Rqk4+fzHq2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 March 2012 14:03, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Mar 22, 2012 at 12:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> In streaming replication, after failover, new master might have lots
>>> of un-applied
>>> WAL files with old timeline ID. They are the WAL files which were recycled as a
>>> future ones when the server was running as a standby. Since they will never be
>>> used later, they don't need to be archived after failover. But since they have
>>> neither .ready nor .done file in archive_status, checkpoints after
>>> failover newly
>>> create .reacy files for them, and then finally they are archived.
>>> Which might cause
>>> disk I/O spike both in WAL and archive storage.
>>>
>>> To avoid the above problem, I think that un-applied WAL files with old
>>> timeline ID
>>> should be marked as already-archived and recycled immediately at the end of
>>> recovery. Thought?
>>
>> I'm not an expert on this, but that makes sense to me.
>
> Thanks for agreeing with my idea.
>
> On second thought, I found other issues about WAL archiving after
> failover. So let me clarify the issues again.
>
> Just after failover, there can be three kinds of WAL files in new
> master's pg_xlog directory:
>
> (1) WAL files which were recycled to by restartpoint
>
> I've already explained upthread the issue which these WAL files cause
> after failover.

This might be a problem, or it might be archiving important data and
you have a corrupt WAL file/CRC. I'd rather take the hit than to
delete potentially useful data. And it avoids having a bug that
deletes useful segments also.

> (2) WAL files which were restored from the archive
>
> In 9.1 or before, the restored WAL files don't remain after failover
> because they are always restored onto the temporary filename
> "RECOVERYXLOG". So the issue which I explain from now doesn't exist
> in 9.1 or before.
>
> In 9.2dev, as the result of supporting cascade replication,
> an archived WAL file is restored onto correct file name so that
> cascading walsender can send it to another standby. This restored
> WAL file has neither .ready nor .done archive status file. After
> failover, checkpoint checks the archive status file of the restored
> WAL file to attempt to recycle it, finds that it has neither .ready
> nor ,done, and creates .ready. Because of existence of .ready,
> it will be archived again even though it obviously already exists in
> the archival storage :(
>
> To prevent a restored WAL file from being archived again, I think
> that .done should be created whenever WAL file is successfully
> restored (of course this should happen only when archive_mode is
> enabled). Thought?

Agreed

> Since this is the oversight of cascade replication, I'm thinking to
> implement the patch for 9.2dev.

Very much so.

> (3) WAL files which were streamed from the master
>
> These WAL files also don't have any archive status, so checkpoint
> creates .ready for them after failover. And then, all or many of
> them will be archived at a time, which would cause I/O spike on
> both WAL and archival storage.
>
> To avoid this problem, I think that we should change walreceiver
> so that it creates .ready as soon as it completes the WAL file. Also
> we should change the archiver process so that it starts up even in
> standby mode and archives the WAL files.
>
> If each server has its own archival storage, the above solution would
> work fine. But if all servers share the archival storage, multiple archiver
> processes in those servers might archive the same WAL file to
> the shared area at the same time. Is this OK? If not, to avoid this,
> we might need to separate archive_mode into two: one for normal mode
> (i.e., master), another for standbfy mode. If the archive is shared,
> we can ensure that only one archiver in the master copies the WAL file
> at the same time by disabling WAL archiving in standby mode but
> enabling it in normal mode. Thought?

Use %s as an option to be passed to the archive command.

> Invoking the archiver process in standby mode is new feature,
> not a bug fix. It's too late to propose new feature for 9.2. So I'll
> propose this for 9.3.

Yep, good idea.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-06-05 21:18:49 Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Previous Message Simon Riggs 2012-06-05 20:51:27 Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.