From: | Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
Cc: | Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch for fail-back without fresh backup |
Date: | 2013-06-15 07:49:43 |
Message-ID: | CAD21AoAn-kPdxRgFuXW8+=k8S0LH=XhRFcpMq+vxyA6rvbvraw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
>> Hello,
>
>> We have already started a discussion on pgsql-hackers for the problem of
> taking fresh backup during the failback operation here is the link for that:
>
>>
> http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
> JgWrFu513s+Q(at)mail(dot)gmail(dot)com
>
>> Let me again summarize the problem we are trying to address.
>
>> When the master fails, last few WAL files may not reach the standby. But
> the master may have gone ahead and made changes to its local file system
> after > flushing WAL to the local storage. So master contains some file
> system level changes that standby does not have. At this point, the data
> directory of > master is ahead of standby's data directory.
>> Subsequently, the standby will be promoted as new master. Later when the
> old master wants to be a standby of the new master, it can't just join the
>> setup since there is inconsistency in between these two servers. We need
> to take the fresh backup from the new master. This can happen in both the
>> synchronous as well as asynchronous replication.
>
>> Fresh backup is also needed in case of clean switch-over because in the
> current HEAD, the master does not wait for the standby to receive all the
> WAL
>> up to the shutdown checkpoint record before shutting down the connection.
> Fujii Masao has already submitted a patch to handle clean switch-over case,
>> but the problem is still remaining for failback case.
>
>> The process of taking fresh backup is very time consuming when databases
> are of very big sizes, say several TB's, and when the servers are connected
>> over a relatively slower link. This would break the service level
> agreement of disaster recovery system. So there is need to improve the
> process of
>> disaster recovery in PostgreSQL. One way to achieve this is to maintain
> consistency between master and standby which helps to avoid need of fresh
>> backup.
>
>> So our proposal on this problem is that we must ensure that master should
> not make any file system level changes without confirming that the
>> corresponding WAL record is replicated to the standby.
>
> How will you take care of extra WAL on old master during recovery. If it
> plays the WAL which has not reached new-master, it can be a problem.
you means that there is possible that old master's data ahead of new
master's data.
so there is inconsistent data between those server when fail back. right?
if so , there is not possible inconsistent. because if you use GUC option
as his propose (i.g., failback_safe_standby_mode = remote_flush),
when old master is working fine, all file system level changes aren't
done before WAL replicated.
--
Regards,
-------
Sawada Masahiko
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2013-06-15 07:53:38 | Re: Hard to Use WAS: Hard limit on WAL space |
Previous Message | Stefan Drees | 2013-06-15 07:48:49 | Re: Hard to Use WAS: Hard limit on WAL space |