Re: Timeout and wait-forever in sync rep

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timeout and wait-forever in sync rep
Date: 2010-10-16 13:02:38
Message-ID: AANLkTik1_eRB5RsozTuKd4=uS8uZ=fg+yh-mrv76epNM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 8:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.
>
> * replication_timeout
>  This is the maximum time to wait for the ACK from the standby. If this
>  timeout expires, the master closes the replication connection and
>  disconnects the standby. This parameter is just used for the master
>  to detect the standby crash or the network outage.
>
>  We already have keepalive parameters for that purpose. But they cannot
>  detect the disconnection in some cases. So replication_timeout needs
>  to be introduced for sync rep.

Good design, +1.

> * allow_standalone_master
>  This specifies whether we allow the master to process transactions
>  alone when there is no connected and sync'd standby.
>
>  If this is false, all the transactions on the master are blocked until
>  sync'd standby has appeared. Of course, this happen not only when
>  replication_timeout expires but also when we start the master alone
>  at the initial setup, when the master detects the disconnection by
>  using keepalive parameters, and when the standby is shut down normally.
>  People who want 'wait-forever' will disable this parameter to reduce
>  the risk of data loss.
>
>  OTOH, if this is true, the absence of sync'd standby doesn't prevent
>  the master from processing transactions alone. People who want high
>  availability even though the risk of data loss increases will enable
>  this parameter.

I'm not wild about the name, but otherwise this seems well-designed.

> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
> -forever' (i.e., you set allow_standalone_master to false), the master
> should detect the standby crash as soon as possible by using the
> timeout. For example, imagine that max_wal_senders is set to one and
> the master cannot detect the standby crash because of absence of the
> timeout. In this case, even if you start new standby, it will not be
> able to connect to the master since there is no free walsender slot.
> As the result, the master actually waits forever.

Good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-16 13:04:57 Re: Is LISTEN/NOTIFY reliable?
Previous Message Martijn van Oosterhout 2010-10-16 11:39:21 Re: knngist - 0.8