Re: Timeline in the light of Synchronous replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: fazool mein <fazoolmein(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timeline in the light of Synchronous replication
Date: 2010-10-18 08:31:03
Message-ID: AANLkTinLfPfqErpuRRaHVdrjVFwCeUDzGdA_5DPyg7Bq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 14, 2010 at 8:23 AM, fazool mein <fazoolmein(at)gmail(dot)com> wrote:
> The concept of time line makes sense to me in the case of asynchronous
> replication. But in case of synchronous replication, I am not so sure.
>
> When a standby connects to the primary, it checks if both have the same time
> line. If not, it doesn't start.
>
> Now, consider the following scenario. The primary (call it A) fails, the
> standby (call it B), via a trigger file, comes out of recovery mode
> (increments time line id to say 2), and morphs into a primary. Now, lets say
> we start the old primary A as a standby, to connect to the new primary B
> (which previously was a standby). As the code is at the moment, the old
> primary A will not be allowed to connect to the new primary B because A's
> timelineid (1) is not equivalent to that of the new primary B (2). Hence, we
> need to create a backup again, and setup the standby from scratch.

Yep.

> In the above scenario, if the system was using asynchronous replication,
> time lines would have saved us from doing something wrong. But, if we are
> using synchronous replication, we know that both A and B would have been in
> sync before the failure. In this case, forcing to create a new standby from
> scratch because of time lines might not be very desirable if the database is
> huge.

At least in my sync rep patch, the data buffer flush waits until WAL has
been written to the disk, but not until WAL has arrived at the standby.
So the database in A might be ahead of that in B, even in sync rep. To
avoid this, we should make the buffer flush wait for also replication?

But, even though we will have done that, it should be noted that WAL in
A might be ahead of that in B. For example, A might crash right after
writing WAL to the disk and before sending it to B. So when we restart
the old master A as the standby after failover, we should need to delete
some WAL files (in A) which are inconsistent with the WAL sequence in B.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-10-18 08:57:26 Re: Timeline in the light of Synchronous replication
Previous Message Fujii Masao 2010-10-18 07:03:50 Re: Timeout and wait-forever in sync rep