On Thu, Mar 22, 2012 at 12:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> In streaming replication, after failover, new master might have lots
>> of un-applied
>> WAL files with old timeline ID. They are the WAL files which were recycled as a
>> future ones when the server was running as a standby. Since they will never be
>> used later, they don't need to be archived after failover. But since they have
>> neither .ready nor .done file in archive_status, checkpoints after
>> failover newly
>> create .reacy files for them, and then finally they are archived.
>> Which might cause
>> disk I/O spike both in WAL and archive storage.
>> To avoid the above problem, I think that un-applied WAL files with old
>> timeline ID
>> should be marked as already-archived and recycled immediately at the end of
>> recovery. Thought?
> I'm not an expert on this, but that makes sense to me.
Thanks for agreeing with my idea.
On second thought, I found other issues about WAL archiving after
failover. So let me clarify the issues again.
Just after failover, there can be three kinds of WAL files in new
master's pg_xlog directory:
(1) WAL files which were recycled to by restartpoint
I've already explained upthread the issue which these WAL files cause
(2) WAL files which were restored from the archive
In 9.1 or before, the restored WAL files don't remain after failover
because they are always restored onto the temporary filename
"RECOVERYXLOG". So the issue which I explain from now doesn't exist
in 9.1 or before.
In 9.2dev, as the result of supporting cascade replication,
an archived WAL file is restored onto correct file name so that
cascading walsender can send it to another standby. This restored
WAL file has neither .ready nor .done archive status file. After
failover, checkpoint checks the archive status file of the restored
WAL file to attempt to recycle it, finds that it has neither .ready
nor ,done, and creates .ready. Because of existence of .ready,
it will be archived again even though it obviously already exists in
the archival storage :(
To prevent a restored WAL file from being archived again, I think
that .done should be created whenever WAL file is successfully
restored (of course this should happen only when archive_mode is
Since this is the oversight of cascade replication, I'm thinking to
implement the patch for 9.2dev.
(3) WAL files which were streamed from the master
These WAL files also don't have any archive status, so checkpoint
creates .ready for them after failover. And then, all or many of
them will be archived at a time, which would cause I/O spike on
both WAL and archival storage.
To avoid this problem, I think that we should change walreceiver
so that it creates .ready as soon as it completes the WAL file. Also
we should change the archiver process so that it starts up even in
standby mode and archives the WAL files.
If each server has its own archival storage, the above solution would
work fine. But if all servers share the archival storage, multiple archiver
processes in those servers might archive the same WAL file to
the shared area at the same time. Is this OK? If not, to avoid this,
we might need to separate archive_mode into two: one for normal mode
(i.e., master), another for standbfy mode. If the archive is shared,
we can ensure that only one archiver in the master copies the WAL file
at the same time by disabling WAL archiving in standby mode but
enabling it in normal mode. Thought?
Invoking the archiver process in standby mode is new feature,
not a bug fix. It's too late to propose new feature for 9.2. So I'll
propose this for 9.3.
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
In response to
pgsql-hackers by date
|Next:||From: Dimitri Fontaine||Date: 2012-03-23 14:05:37|
|Subject: Re: Finer Extension dependencies|
|Previous:||From: Merlin Moncure||Date: 2012-03-23 13:21:54|
|Subject: Re: Regarding column reordering project for GSoc 2012|