Skip site navigation (1) Skip section navigation (2)

Re: Sync Rep Design

From: Robert Treat <rob(at)xzilla(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep Design
Date: 2010-12-30 20:07:16
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:
> > it would help if this would just be a simple text-only description of
> > the design that people can actually comment on inline. I don't think
> > sending technical design proposals as a pdf (which seems to be written
> > in doc-style as well) is a good idea to encourage discussion on -hackers
> :(
> 25.2.6. Synchronous Replication
> Streaming replication is by default asynchronous. Transactions on the
> primary server write commit records to WAL, yet do not know whether or
> when a standby has received and processed those changes. So with
> asynchronous replication, if the primary crashes, transactions committed
> on the primary might not have been received by any standby. As a result,
> failover from primary to standby could cause data loss because
> transaction completions are absent, relative to the primary. The amount
> of data loss is proportional to the replication delay at the time of
> failover.
> Synchronous replication offers the ability to guarantee that all changes
> made by a transaction have been transferred to at least one remote
> standby server. This is an extension to the standard level of durability
> offered by a transaction commit. This is referred to as semi-synchronous
> replication.
> When synchronous replication is requested, the commit of a write
> transaction will wait until confirmation that the commit record has been
> transferred successfully to at least one standby server. Waiting for
> confirmation increases the user's confidence that the changes will not
> be lost in the event of server crashes but it also necessarily increases
> the response time for the requesting transaction. The minimum wait time
> is the roundtrip time from primary to standby.
> Read only transactions and transaction rollbacks need not wait for
> replies from standby servers. Subtransaction commits do not wait for
> responses from standby servers, only final top-level commits. Long
> running actions such as data loading or index building do not wait until
> the very final commit message.
> Basic Configuration
> Synchronous replication must be enabled on both the primary and at least
> one standby server. If synchronous replication is disabled on the
> master, or enabled on the primary but not enabled on any slaves, the
> primary will use asynchronous replication by default.
> We use a single parameter to enable synchronous replication, set in
> postgresql.conf on both primary and standby servers:
> synchronous_replication = off (default) | on
> On the primary, synchronous_replication can be set for particular users
> or databases, or dynamically by applications programs.
This seems like a potential issue, where I start a server with this off, and
then I start turning it on for specific transactions; it isn't exactly clear
what happens, since there may or may not be a running synchronous rep slave
available.  (I love the idea though)

> If more than one standby server specifies synchronous_replication, then
> whichever standby replies first will release waiting commits.
I don't want you to think I am setting an expectation, but I'm curious about
the possibility of requiring more than 1 server to reply?

> Turning this setting off for a standby allows the administrator to
> exclude certain standby servers from releasing waiting transactions.
> This is useful if not all standby servers are designated as potential
> future primary servers. On the standby, this parameter only takes effect
> at server start.
> Planning for Performance
> Synchronous replication usually requires carefully planned and placed
> standby servers to ensure applications perform acceptably. Waiting
> doesn't utilise system resources, but transaction locks continue to be
> held until the transfer is confirmed. As a result, incautious use of
> synchronous replication will reduce performance for database
> applications because of increased response times and higher contention.
> PostgreSQL allows the application developer to specify the durability
> level required via replication. This can be specified for the system
> overall, though it can also be specified for specific users or
> connections, or even individual transactions.
> For example, an application workload might consist of: 10% of changes
> are important customer details, while 90% of changes are less important
> data that the business can more easily survive if it is lost, such as
> chat messages between users.
> With synchronous replication options specified at the application level
> (on the master) we can offer sync rep for the most important changes,
> without slowing down the bulk of the total workload. Application level
> options are an important and practical tool for allowing the benefits of
> synchronous replication for high performance applications. This feature
> is unique to PostgreSQL.
> Planning for High Availability
> The easiest and safest method of gaining High Availability using
> synchronous replication is to configure at least two standby servers. To
> understand why, we need to examine what can happen when you lose all
> standby servers.
> Commits made when synchronous_replication is set will wait until at
> least one standby responds. The response may never occur if the last, or
> only, standby should crash or the network drops. What should we do in
> that situation?
> Sitting and waiting will typically cause operational problems because it
> is an effective outage of the primary server. Allowing the primary
> server to continue processing in the absence of a standby puts those
> latest data changes at risk. How we handle this situation is controlled
> by allow_standalone_primary. The default setting is on, allowing
> processing to continue, though there is no recommended setting. Choosing
> the best setting for allow_standalone_primary is a difficult decision
> and best left to those with combined business responsibility for both
> data and applications. The difficulty of this choice is the reason why
> we recommend that you reduce the possibility of this situation occurring
> by using multiple standby servers.
> When the primary is started with allow_standalone_primary enabled, the
> primary will not allow connections until a standby connects that also
> has synchronous_replication enabled. This is a convenience to ensure
> that we don't allow connections before write transactions will return
> successfully.
I think you mean "disabled" at the start here, right?

> When allow_standalone_primary is set, a user will stop waiting once the
> replication_timeout has been reached for their specific session. Users
> are not waiting for a specific standby to reply, they are waiting for a
> reply from any standby, so the unavailability of any one standby is not
> significant to a user. It is possible for user sessions to hit timeout
> even though standbys are communicating normally. In that case, the
> setting of replication_timeout is probably too low.
will a notice or warning be thrown in these cases? I'm thinking something
like the checkpoint timeout warning, but could be something else; it just
seems to me you need some way to know you're timing out.

> The standby sends regular status messages to the primary. If no status
> messages have been received for replication_timeout the primary server
> will assume the connection is dead and terminate it. This happens
> whatever the setting of allow_standalone_primary.
Does the standby attempt to reconnect in these scenarios?

> If primary crashes while commits are waiting for acknowledgement, those
> transactions will be marked fully committed if the primary database
> recovers, no matter how allow_standalone_primary is set.

This seems backwards; if you are waiting for acknowledgement, wouldn't the
normal assumption be that the transactions *didnt* make it to any standby,
and should be rolled back ?

> There is no way
> to be certain that all standbys have received all outstanding WAL data
> at time of the crash of the primary. Some transactions may not show as
> committed on the standby, even though they show as committed on the
> primary. The guarantee we offer is that the application will not receive
> explicit acknowledgement of the successful commit of a transaction until
> the WAL data is known to be safely received by the standby. Hence this
> mechanism is technically "semi synchronous" rather than "fully
> synchronous" replication. Note that replication still not be fully
> synchronous even if we wait for all standby servers, though this would
> reduce availability, as described previously.
I think we ought to have an example of the best configuration for "cannot
afford to lose any data" scenarios, where we would prefer an overall service
interruption over the chance of having the primary / secondary out of

> If you need to re-create a standby server while transactions are
> waiting, make sure that the commands to run pg_start_backup() and
> pg_stop_backup() are run in a session with synchronous_replication =
> off, otherwise those requests will wait forever for the standby to
> appear.
> 18.5.5. Synchronous Replication
> These settings control the behavior of the built-in synchronous
> replication feature. These parameters would be set on the primary server
> that is to send replication data to one or more standby servers.
> synchronous_replication (boolean)
>        Specifies whether transaction commit will wait for WAL records
>        to be replicated before the command returns a "success"
>        indication to the client. The default setting is off. When on,
>        there will be a delay while the client waits for confirmation of
>        successful replication. That delay will increase depending upon
>        the physical distance and network activity between primary and
>        standby. The commit wait will last until the first reply from
>        any standby. Multiple standby servers allow increased
>        availability and possibly increase performance as well.
>        The parameter must be set on both primary and standby.
>        On the primary, this parameter can be changed at any time; the
>        behavior for any one transaction is determined by the setting in
>        effect when it commits. It is therefore possible, and useful, to
>        have some transactions replicate synchronously and others
>        asynchronously. For example, to make a single multistatement
>        transaction commit asynchronously when the default is
>        synchronous replication, issue SET LOCAL synchronous_replication
>        TO OFF within the transaction.
>        On the standby, the parameter value is taken only at server
>        start.
> synchronous_replication_timeout (boolean)
>        If the client has synchronous_replication set, and
>        allow_standalone_primary is also set, then the commit will wait
>        for up to synchronous_replication_timeout milliseconds before it
>        returns a "success", or will wait forever if
>        synchronous_replication_timeout is set to -1.
>        If a standby server does not reply for
>        synchronous_replication_timeout the primary will terminate the
>        replication connection.
> allow_standalone_primary (boolean)
>        If allow_standalone_primary is not set, then the server will not
>        allow connections until a standby connects that has
>        synchronous_replication enabled.
>        allow_standalone_primary also affects the behaviour when the
>        synchronous_replication_timeout is reached.
somewhat concerned that we seem to need to use double negatives to describe
whats going on here. it makes me think we ought to rename this to
require_synchronous_standby or similar.

> 25.5.2. Handling query conflicts
> ….
> Remedial possibilities exist if the number of standby-query
> cancellations is found to be unacceptable. Typically the best option is
> to enable hot_standby_feedback. This prevents VACUUM from removing
> recently-dead rows and so cleanup conflicts do not occur. If you do
> this, you should note that this will delay cleanup of dead rows on the
> primary, which may result in undesirable table bloat. However, the
> cleanup situation will be no worse than if the standby queries were
> running directly on the primary server. You are still getting the
> benefit of off-loading execution onto the standby and the query may
> complete faster than it would have done on the primary server.
> max_standby_archive_delay must be kept large in this case, because
> delayed WAL files might already contain entries that conflict with the
> desired standby queries.
> …
> 18.5.6. Standby Servers
> These settings control the behavior of a standby server that is to
> receive replication data.
> hot_standby (boolean)
>        Specifies whether or not you can connect and run queries during
>        recovery, as described in Section 25.5. The default value is
>        off. This parameter can only be set at server start. It only has
>        effect during archive recovery or in standby mode.
> hot_standby_feedback (boolean)
>        Specifies whether or not a hot standby will send feedback to the
>        primary about queries currently executing on the standby. This
>        parameter can be used to eliminate query cancels caused by
>        cleanup records, though it can cause database bloat on the
>        primary for some workloads. The default value is off. This
>        parameter can only be set at server start. It only has effect if
>        hot_standby is enabled.
i was expecting this section to mention the synchronous_replication (bool)
somewhere, to control if the standby will participate synchronously or
asynch; granted it's the same config as listed in 18.5.5 right? Just that
the heading of that section specifically targets the primary.

HTH, looks pretty good at first glance.

Robert Treat

In response to


pgsql-hackers by date

Next:From: Marti RaudseppDate: 2010-12-30 20:11:40
Subject: Re: Sync Rep Design
Previous:From: Tomas VondraDate: 2010-12-30 19:38:03
Subject: Re: estimating # of distinct values

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group