Re: Sync Rep v17

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Farina <daniel(at)heroku(dot)com>
Subject: Re: Sync Rep v17
Date: 2011-03-02 14:30:42
Message-ID: AANLkTikqSZZSU7xn-mpQPymWr7zoO=8jsjjeQZqrxebV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The WALSender deliberately does *not* wake waiting users if the standby
> disconnects. Doing so would break the whole reason for having sync rep
> in the first place. What we do is allow a potential standby to takeover
> the role of sync standby, if one is available. Or the failing standby
> can reconnect and then release waiters.

If there is potential standby when synchronous standby has gone, I agree
that it's not good idea to release the waiting backends soon. In this case,
those backends should wait for next synchronous standby.

On the other hand, if there is no potential standby, I think that the waiting
backends should not wait for the timeout and should wake up as soon as
synchronous standby has gone. Otherwise, those backends suspend for
a long time (i.e., until the timeout expires), which would decrease the
high-availability, I'm afraid.

Keeping those backends waiting for the failed standby to reconnect is an
idea. But this looks like the behavior for "allow_standalone_primary = off".
If allow_standalone_primary = on, it looks more natural to make the
primary work alone without waiting the timeout.

> If we shutdown, then we wait for the shutdown commit record to be
> transferred to our standby, so a normal or fast shutdown of the master
> always leaves all connected standbys up to date. We already do that, so
> sync rep doesn't touch that behaviour. If a standby is disconnected,
> then it doesn't receive the shutdown checkpoint record.
>
> The wait state for a commit does not persist when we shutdown and
> restart.
>
> Can you restate which bits of the above you think need to be changed?

What I'm thinking is: when the waiting backends are released because
of the timeout while the fast shutdown is being done in the master,
those backends should not return the success indication to the client.
Of course, in that case, WAL has already been flushed in the master,
but I think that those backends should exit with FATAL error before
returning the success. This is for avoiding breaking the synchronous
replication rule, i.e., all the transaction which the client knows as
committed must be committed in the synchronous standby after failover.

If we allow those backends to return the success in that situation, the
following scenario which can cause a data loss can happen.

1. The primary is running with allow_standalone_primary = on. There
is only one (synchronous) standby connected.
2. The replication connection is closed because of the network outage.
3. While some backends are waiting for replication, the user requests
fast shutdown in the master.
4. Since the timeout expires, those backends stop waiting and return
the success indication to the client (but not replicated to the standby).
5. Since there is no backend waiting for replication, fast shutdown
completes.
6. The clusterware like pacemaker detects the death of the primary
and triggers the failover.
7. New primary doesn't have some transactions committed to the
client, i.e., transaction lost happens!!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-03-02 14:40:17 Re: Sync Rep v17
Previous Message Jan Urbański 2011-03-02 13:29:55 Re: Alpha4 release blockers (was Re: wrapping up this CommitFest)