Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: depesz(at)depesz(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc
Date: 2012-12-17 12:01:20
Message-ID: 50CF0990.8020506@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-general

On 15.12.2012 17:06, hubert depesz lubaczewski wrote:
> I might be missing something, but what exactly does that commit give us?
>
> I mean - we were able, previously, to make slave switch to new master
> - as Phil Sorber described in here:
> http://philsorber.blogspot.com/2012/03/what-to-do-when-your-timeline-isnt.html
>
> After some talk on IRC, I understood that this patch will make it
> possible to switch to new master in plain SR replication, with no WAL
> archive (because if you have wal archive, you can use the method Phil
> described, which basically "just works").

Right, that's exactly the point of the patch. A WAL archive is no longer
necessary for failover.

> So I did setup three machines: master and two slaves.
> Master had 2 IPs - its own, and a floating one.
> Both slaves were connecting to the floating one, and recovery.conf
> looked like:
> ---------
> standby_mode = 'on'
> primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
> trigger_file = '/tmp/finish.replication'
> recovery_target_timeline='latest'
> ---------
>
> After I verified that replication works to both slaves, I did failover one of
> the slaves, shut down master, and did ip takeover of floating ip to the slave
> that did takeover.

Hmm, is it possible that some WAL was generated in the old master, and
streamed to the standby, after the new master was already promoted? It's
important to kill the old master before promoting the new master.
Otherwise the timelines diverge, so that you have some WAL on the old
timeline that's not present in the new master, and some WAL in the new
master's timeline that's not present in the old master. In that
situation, if the standby has already replicated the WAL from the old
master, it can no longer start to follow the new master. I think that
would match the symptoms you're seeing.

I wouldn't rule out a bug in the patch either, though. Amit found a
worrying number of bugs in his testing, and although we stamped out all
the known bugs, it wouldn't surprise me if there's more :-(..

- Heikki

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message hubert depesz lubaczewski 2012-12-17 14:11:02 Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc
Previous Message Tom Lane 2012-12-16 20:03:25 pgsql: Fix filling of postmaster.pid in bootstrap/standalone mode.

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2012-12-17 13:22:21 Re: problem with large inserts
Previous Message Groshev Andrey 2012-12-17 11:33:40 trouble with pg_upgrade 9.0 -> 9.1