Quick Links

Re: Unnecessary WAL archiving after failover

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unnecessary WAL archiving after failover
Date:	2012-05-02 23:18:15
Message-ID:	CA+Tgmoa5BBq_6LfUqa1v8HK1C-cT8UtjaTHFgPN4=6kWf2o95Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 23, 2012 at 10:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On second thought, I found other issues about WAL archiving after
> failover. So let me clarify the issues again.
>
> Just after failover, there can be three kinds of WAL files in new
> master's pg_xlog directory:
>
> (1) WAL files which were recycled to by restartpoint
>
> I've already explained upthread the issue which these WAL files cause
> after failover.

Check.

> (2) WAL files which were restored from the archive
>
> In 9.1 or before, the restored WAL files don't remain after failover
> because they are always restored onto the temporary filename
> "RECOVERYXLOG". So the issue which I explain from now doesn't exist
> in 9.1 or before.
>
> In 9.2dev, as the result of supporting cascade replication,
> an archived WAL file is restored onto correct file name so that
> cascading walsender can send it to another standby. This restored
> WAL file has neither .ready nor .done archive status file. After
> failover, checkpoint checks the archive status file of the restored
> WAL file to attempt to recycle it, finds that it has neither .ready
> nor ,done, and creates .ready. Because of existence of .ready,
> it will be archived again even though it obviously already exists in
> the archival storage :(
>
> To prevent a restored WAL file from being archived again, I think
> that .done should be created whenever WAL file is successfully
> restored (of course this should happen only when archive_mode is
> enabled). Thought?
>
> Since this is the oversight of cascade replication, I'm thinking to
> implement the patch for 9.2dev.

Yes, I think we had better fix this in 9.2. As you say, it's a loose
end from streaming replication. Do you have a patch?

> (3) WAL files which were streamed from the master
>
> These WAL files also don't have any archive status, so checkpoint
> creates .ready for them after failover. And then, all or many of
> them will be archived at a time, which would cause I/O spike on
> both WAL and archival storage.
>
> To avoid this problem, I think that we should change walreceiver
> so that it creates .ready as soon as it completes the WAL file. Also
> we should change the archiver process so that it starts up even in
> standby mode and archives the WAL files.
>
> If each server has its own archival storage, the above solution would
> work fine. But if all servers share the archival storage, multiple archiver
> processes in those servers might archive the same WAL file to
> the shared area at the same time. Is this OK? If not, to avoid this,
> we might need to separate archive_mode into two: one for normal mode
> (i.e., master), another for standbfy mode. If the archive is shared,
> we can ensure that only one archiver in the master copies the WAL file
> at the same time by disabling WAL archiving in standby mode but
> enabling it in normal mode. Thought?

Another option would be to run the archiver in both modes and somehow
pass a flag indicating whether it's running in standby mode or normal
running.

> Invoking the archiver process in standby mode is new feature,
> not a bug fix. It's too late to propose new feature for 9.2. So I'll
> propose this for 9.3.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Unnecessary WAL archiving after failover at 2012-03-23 14:03:27 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-05-02 23:21:43	Re: Latch for the WAL writer - further reducing idle wake-ups.
Previous Message	Tom Lane	2012-05-02 22:36:19	Re: proposal: additional error fields