Re: Inconsistent DB data in Streaming Replication

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Sameer Thakur <samthakur74(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, sthomas(at)optionshouse(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inconsistent DB data in Streaming Replication
Date: 2013-04-12 10:57:19
Message-ID: 20130412105719.GA5766@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-04-12 02:29:01 +0900, Fujii Masao wrote:
> On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> >
> > You just shut down the old master and let the standby catch
> > up (takas a few microseconds ;) ) before you promote it.
> >
> > After this you can start up the former master with recovery.conf
> > and it will follow nicely.
>
> No. When you shut down the old master, it might not have been
> able to send all the WAL records to the standby. I have observed
> this situation several times. So in your approach, new standby
> might fail to catch up with the master nicely.

It seems most of this thread is focusing on the wrong thing then. If we
really are only talking about planned failover then we need to solve
*that* not some ominous "don't flush data too early" which has
noticeable performance and implementation complexity problems.

I guess youre observing that not everything is replicated because youre
doing an immediate shutdown - probably because performing the shutdown
checkpoint would take too long. This seems solveable by implementing a
recovery connection command which initiates a shutdown that just
disables future WAL inserts and returns the last lsn that has been
written. Then you can fall over as soon as that llsn has been reached
and can make the previous master follow from there on without problems.

You could even teach the standby not to increment the timeline in that
case since thats safe.

The biggest issue seems to be how to implement this without another
spinlock acquisition for every XLogInsert(), but that seems possible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-04-12 10:59:45 Re: Inconsistent DB data in Streaming Replication
Previous Message Shuai Fan 2013-04-12 10:34:42 Re: [GSOC] questions about idea "rewrite pg_dump as library"