Re: Cascading replication and recovery_target_timeline='latest'

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Date: 2012-09-05 00:34:59
Message-ID: 50469E33.902@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.09.2012 16:50, Tom Lane wrote:
> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>> Heikki,
>>> It is for 9.2. I'll do a little bit more testing, and barring any
>>> issues, commit the patch. What exactly is the schedule? Do we need to do
>>> a RC2 because of this?
>
>> We're currently scheduled to release next week. If we need to do an
>> RC2, we're going to have to do some fast rescheduling; we've already
>> started the publicity machine.
>
> At this point I would argue that the only thing that should abort the
> launch is a bad regression. Minor bugs in new features (and this must
> be minor if it wasn't noticed before) don't qualify.
>
> Having said that, it'd be good to get it fixed if we can. The schedule
> says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
> tomorrow (Wednesday)?

The attached patch fixes it for me. It fixes the original problem, by
adding the missing locking and terminating walsenders on a target
timeline change, and also changes the behavior wrt. WAL segments
restored from the archive, as I just suggested in another email
(http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php)

The test case I've been using is a master and two standbys. The first
standby is set up to connect to the master with streaming replication,
and the other standby is set up to connect to the 1st standby, ie. it's
a cascading slave. In addition, the master is set up to do WAL archiving
to a directory, and both standbys have a restore_command to read from
that archive, and restore_target_timeline='latest'. After the master and
both standbys are running, I create a dummy recovery.conf file in
master's data directory, with just "restore_command='/bin/false'" in it,
and restart the master. That forces a timeline change in the master.
With the patch, the 1st standby will notice the new timeline in the
archive, switch to that, and reconnect to the master. The cascading
connection to the 2nd standby is terminated because of the timeline
change, the 2nd standby will also scan the archive and pick up the new
timeline, reconnect to the 1st standby, and be in sync again.

- Heikki

Attachment Content-Type Size
disconnect-walsenders-on-target-tli-change-2.patch text/x-diff 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-09-05 00:35:43 Re: too much pgbench init output
Previous Message Heikki Linnakangas 2012-09-05 00:14:11 Re: Cascading replication and recovery_target_timeline='latest'