Re: Switching timeline over streaming replication

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: 'PostgreSQL-development' <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Switching timeline over streaming replication
Date: 2012-09-25 07:08:41
Message-ID: 50615879.2090608@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.09.2012 16:33, Amit Kapila wrote:
> On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:
>> I've been working on the often-requested feature to handle timeline
>> changes over streaming replication. At the moment, if you kill the
>> master and promote a standby server, and you have another standby
>> server that you'd like to keep following the new master server, you
>> need a WAL archive in addition to streaming replication to make it
>> cross the timeline change. Streaming replication will just error out.
>> Having a WAL archive is usually a good idea in complex replication
>> scenarios anyway, but it would be good to not require it.
>
> Confirm my understanding of this feature:
>
> This feature is for case when standby-1 who is going to be promoted to
> master has archive mode 'on'.

No. This is for the case where there is no WAL archive.
archive_mode='off' on all servers.

Or to be precise, you can also have a WAL archive, but this patch
doesn't affect that in any way. This is strictly about streaming
replication.

> As in that case only its timeline will change.

The timeline changes whenever you promote a standby. It's not related to
whether you have a WAL archive or not.

> If above is right, then there can be other similar scenario's where it can
> be used:
>
> Scenario-1 (1 Master, 1 Stand-by)
> 1. Master (archive_mode=on) goes down.
> 2. Master again comes up
> 3. Stand-by tries to follow it
>
> Now in above scenario also due to timeline mismatch it gives error, but your
> patch should fix it.

If the master simply crashes or is shut down, and then restarted, the
timeline doesn't change. The standby will reconnect / poll the archive,
and sync up just fine, even without this patch.

> However I am not sure about splitting for RestoreArchivedFile() and
> ExecuteRecoveryCommand() into separate file.
> How about splitting for all Archive related functions:
> static void XLogArchiveNotify(const char *xlog);
> static void XLogArchiveNotifySeg(XLogSegNo segno);
> static bool XLogArchiveCheckDone(const char *xlog);
> static bool XLogArchiveIsBusy(const char *xlog);
> static void XLogArchiveCleanup(const char *xlog);

Hmm, sounds reasonable.

> In any case, it will be better if you can split it into multiple patches:
> 1. Having new functionality of "Switching timeline over streaming
> replication"
> 2. Refactoring related changes.
>
> It can make my testing and review for new feature patch little easier.

Yep, I'll go ahead and split the patch. Thanks!

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-09-25 07:37:29 Re: Configuration include directory
Previous Message Karl O. Pinc 2012-09-25 05:28:13 Re: Doc patch to note which system catalogs have oids