Skip site navigation (1) Skip section navigation (2)

Timeout and wait-forever in sync rep

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Timeout and wait-forever in sync rep
Date: 2010-10-15 12:41:42
Message-ID: AANLkTikP0dGiOzr6zh0v-VthZ+Dwbt3kh3vEKQsZ0Xon@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi,

As the result of the discussion, I think that we need the following two
parameters for the case where the standby goes down.

* replication_timeout
  This is the maximum time to wait for the ACK from the standby. If this
  timeout expires, the master closes the replication connection and
  disconnects the standby. This parameter is just used for the master
  to detect the standby crash or the network outage.

  We already have keepalive parameters for that purpose. But they cannot
  detect the disconnection in some cases. So replication_timeout needs
  to be introduced for sync rep.

* allow_standalone_master
  This specifies whether we allow the master to process transactions
  alone when there is no connected and sync'd standby.

  If this is false, all the transactions on the master are blocked until
  sync'd standby has appeared. Of course, this happen not only when
  replication_timeout expires but also when we start the master alone
  at the initial setup, when the master detects the disconnection by
  using keepalive parameters, and when the standby is shut down normally.
  People who want 'wait-forever' will disable this parameter to reduce
  the risk of data loss.

  OTOH, if this is true, the absence of sync'd standby doesn't prevent
  the master from processing transactions alone. People who want high
  availability even though the risk of data loss increases will enable
  this parameter.

The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
-forever' (i.e., you set allow_standalone_master to false), the master
should detect the standby crash as soon as possible by using the
timeout. For example, imagine that max_wal_senders is set to one and
the master cannot detect the standby crash because of absence of the
timeout. In this case, even if you start new standby, it will not be
able to connect to the master since there is no free walsender slot.
As the result, the master actually waits forever.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Responses

pgsql-hackers by date

Next:From: Stephen FrostDate: 2010-10-15 13:04:22
Subject: Re: security hook on table creation
Previous:From: Oleg BartunovDate: 2010-10-15 11:37:29
Subject: Re: knngist plans

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group