Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave
Date: 2013-01-17 23:24:31
Message-ID: CAB7nPqQF_F5eJ7iQM9BW-Au6061CH5osL0J4HmsHFazCwgoKQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 18, 2013 at 3:05 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> I encountered the problem that the timeline switch is not performed
> expectedly.
> I set up one master, one standby and one cascade standby. All the servers
> share the archive directory. restore_command is specified in the
> recovery.conf
> in those two standbys.
>
> I shut down the master, and then promoted the standby. In this case, the
> cascade standby should switch to new timeline and replication should be
> successfully restarted. But the timeline was never changed, and the
> following
> log messages were kept outputting.
>
> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
> sby2 LOG: replication terminated by primary server
> sby2 DETAIL: End of WAL reached on timeline 1
> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
> sby2 LOG: replication terminated by primary server
> sby2 DETAIL: End of WAL reached on timeline 1
> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
> sby2 LOG: replication terminated by primary server
> sby2 DETAIL: End of WAL reached on timeline 1
>
I am seeing similar issues with master at 88228e6.
This is easily reproducible by setting up 2 slaves under a master, then
kill the master. Promote slave 1 and reconnect slave 2 to slave 1, then
you will notice that the timeline jump is not done.

I don't know if Masao tried to put in sync the slave that reconnects to the
promoted slave, but in this case slave2 stucks in "potential" state". That
is due to timeline that has not changed on slave2 but better to let you
know...

The replication delays are still here.
--
Michael Paquier
http://michael.otacoo.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-01-17 23:33:22 Re: could not create directory "...": File exists
Previous Message Andres Freund 2013-01-17 23:22:26 Re: HS locking broken in HEAD