Re: Avoiding shutdown checkpoint at failover

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding shutdown checkpoint at failover
Date: 2012-01-18 07:15:21
Message-ID: CAHGQGwHJVQYn=h9O1t_h73_aCV1Af_isW2LUzu9yEUjZ4fjT+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 13, 2011 at 5:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, Nov 1, 2011 at 12:11 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> When I say skip the shutdown checkpoint, I mean remove it from the
>> critical path of required actions at the end of recovery. We can still
>> have a normal checkpoint kicked off at that time, but that no longer
>> needs to be on the critical path.
>>
>> Any problems foreseen? If not, looks like a quick patch.
>
> Patch attached for discussion/review.

This feature is what I want, and very helpful to shorten the failover time in
streaming replication.

Here are the review comments. Though I've not checked enough whether
this feature works fine in all recovery patterns yet.

LocalSetXLogInsertAllowed() must be called before LogEndOfRecovery().
LocalXLogInsertAllowed must be set to -1 after LogEndOfRecovery().

XLOG_END_OF_RECOVERY record is written to the WAL file with new
assigned timeline ID. But it must be written to the WAL file with old one.
Otherwise, when re-entering a recovery after failover, we cannot find
XLOG_END_OF_RECOVERY record at all.

Before XLOG_END_OF_RECOVERY record is written,
RmgrTable[rmid].rm_cleanup() might write WAL records. They also
should be written to the WAL file with old timeline ID.

When recovery target is specified, we cannot write new WAL to the file
with old timeline because which means that valid WAL records in it are
overwritten with new WAL. So when recovery target is specified,
ISTM that we cannot skip end of recovery checkpoint. Or we might need
to save all information about timelines in the database cluster instead
of writing XLOG_END_OF_RECOVERY record, and use it when re-entering
a recovery.

LogEndOfRecovery() seems to need to call XLogFlush(). Otherwise,
what if the server crashes after new timeline history file is created and
recovery.conf is removed, but before XLOG_END_OF_RECOVERY record
has not been flushed to the disk yet?

During recovery, when we replay XLOG_END_OF_RECOVERY record, we
should close the currently-opened WAL file and read the WAL file with
the timeline which XLOG_END_OF_RECOVERY record indicates.
Otherwise, when re-entering a recovery with old timeline, we cannot
reach new timeline.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2012-01-18 08:04:14 Re: WIP patch for parameterized inner paths
Previous Message Fujii Masao 2012-01-18 05:49:57 Re: Client Messages