Re: Inconsistent DB data in Streaming Replication

From: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, ants(at)cybertec(dot)at, andres(at)2ndquadrant(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inconsistent DB data in Streaming Replication
Date: 2013-04-09 06:42:54
Message-ID: CAF8Q-GwH0N7yFUT+QophzsC5z7+7KxRjWPdTUASGzvaO2rgyxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>What Samrat is proposing here is that WAL is not flushed to the OS before

>it is acked by a synchronous replica so recovery won't go past the

>timeline change made in failover, making it necessary to take a new

>base backup to resync with the new master.

Actually we are proposing that the data page on the master is not committed
till master receives ACK from the standby. The WAL files can be flushed to
the disk on both the master and standby, before standby generates ACK to
master. The end objective is the same of avoiding to take base backup of
old master to resync with new master.

>Why do you think that the inconsistent data after failover happens is
>problem? Because

>it's one of the reasons why a fresh base backup is required when
>starting old master as
>new standby? If yes, I agree with you. I've often heard the complaints
>about a backup
>when restarting new standby. That's really big problem.

Yes, taking backup is major problem when the database size is more than
several TB. It would take very long time to ship backup data over the slow
WAN network.

>> One solution to avoid this situation is have the master send WAL records
to standby and wait for ACK from standby committing WAL files to disk and
only after that commit data page related to this transaction on master.

>You mean to make the master wait the data page write until WAL has been
not only
>flushed to disk but also replicated to the standby?

Yes. Master should not write the data page before corresponding WAL
records have been replicated to the standby. The WAL records have been
flushed to disk on both master and standby.

>> The main drawback would be increased wait time for the client due to
extra round trip to standby before master sends ACK to client. Are there
any other issues with this approach?

>I think that you can introduce GUC specifying whether this extra check
>is required to avoid a backup when failback

That would be better idea. We can disable it whenever taking a fresh backup
is not a problem.

Regards,

Samrat

On Mon, Apr 8, 2013 at 10:40 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade
> <revagade(dot)samrat(at)gmail(dot)com> wrote:
> >
> > Hello,
> >
> > We have been trying to figure out possible solutions to the following
> problem in streaming replication Consider following scenario:
> >
> > If master receives commit command, it writes and flushes commit WAL
> records to the disk, It also writes and flushes data page related to this
> transaction.
> >
> > The master then sends WAL records to standby up to the commit WAL
> record. But before sending these records if failover happens then, old
> master is ahead of standby which is now the new master in terms of DB data
> leading to inconsistent data .
>
> Why do you think that the inconsistent data after failover happens is
> problem? Because
> it's one of the reasons why a fresh base backup is required when
> starting old master as
> new standby? If yes, I agree with you. I've often heard the complaints
> about a backup
> when restarting new standby. That's really big problem.
>
> The timeline mismatch after failover was one of the reasons why a
> backup is required.
> But, thanks to Heikki's recent work, that's solved, i.e., the timeline
> mismatch would be
> automatically resolved when starting replication in 9.3. So, the
> remaining problem is an
> inconsistent database.
>
> > One solution to avoid this situation is have the master send WAL records
> to standby and wait for ACK from standby committing WAL files to disk and
> only after that commit data page related to this transaction on master.
>
> You mean to make the master wait the data page write until WAL has been
> not only
> flushed to disk but also replicated to the standby?
>
> > The main drawback would be increased wait time for the client due to
> extra round trip to standby before master sends ACK to client. Are there
> any other issues with this approach?
>
> I think that you can introduce GUC specifying whether this extra check
> is required to
> avoid a backup when failback.
>
> Regards,
>
> --
> Fujii Masao
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-04-09 07:03:59 Re: Enabling Checksums
Previous Message Amit Kapila 2013-04-09 06:05:43 Re: Unrecognized type error (postgres 9.1.4)