Re: Sync Rep Design

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep Design
Date: 2010-12-30 20:55:28
Message-ID: 1293742528.1892.26147.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
> > When allow_standalone_primary is set, a user will stop waiting once
> the
> > replication_timeout has been reached for their specific session.
> Users
> > are not waiting for a specific standby to reply, they are waiting
> for a
> > reply from any standby, so the unavailability of any one standby is
> not
> > significant to a user. It is possible for user sessions to hit
> timeout
> > even though standbys are communicating normally. In that case, the
> > setting of replication_timeout is probably too low.
> >
> >
> will a notice or warning be thrown in these cases? I'm thinking
> something
> like the checkpoint timeout warning, but could be something else; it
> just
> seems to me you need some way to know you're timing out.

We can do that, yes.

> > The standby sends regular status messages to the primary. If no
> status
> > messages have been received for replication_timeout the primary
> server
> > will assume the connection is dead and terminate it. This happens
> > whatever the setting of allow_standalone_primary.
> >
> >
> Does the standby attempt to reconnect in these scenarios?

Yes it would, but the reason why we terminated the connection was it
wasn't talking any more, so it is probably dead.

> > If primary crashes while commits are waiting for acknowledgement,
> those
> > transactions will be marked fully committed if the primary database
> > recovers, no matter how allow_standalone_primary is set.
>
>
> This seems backwards; if you are waiting for acknowledgement, wouldn't
> the
> normal assumption be that the transactions *didnt* make it to any
> standby,
> and should be rolled back ?

Well, we can't roll it back. We have already written the commit record
to WAL.

> > There is no way
> > to be certain that all standbys have received all outstanding WAL
> data
> > at time of the crash of the primary. Some transactions may not show
> as
> > committed on the standby, even though they show as committed on the
> > primary. The guarantee we offer is that the application will not
> receive
> > explicit acknowledgement of the successful commit of a transaction
> until
> > the WAL data is known to be safely received by the standby. Hence
> this
> > mechanism is technically "semi synchronous" rather than "fully
> > synchronous" replication. Note that replication still not be fully
> > synchronous even if we wait for all standby servers, though this
> would
> > reduce availability, as described previously.
> >
> >
> I think we ought to have an example of the best configuration for
> "cannot
> afford to lose any data" scenarios, where we would prefer an overall
> service
> interruption over the chance of having the primary / secondary out of
> synch.

I say "use two or more standbys" more than once...

> >>
> >
> somewhat concerned that we seem to need to use double negatives to
> describe
> whats going on here. it makes me think we ought to rename this to
> require_synchronous_standby or similar.

Don't see why we can't use double negatives. ;-)

The parameter is named directly from Fujii Masao's suggestion.

> > 18.5.6. Standby Servers
> > These settings control the behavior of a standby server that is to
> > receive replication data.
> >

...

> i was expecting this section to mention the synchronous_replication
> (bool)
> somewhere, to control if the standby will participate synchronously or
> asynch; granted it's the same config as listed in 18.5.5 right? Just
> that
> the heading of that section specifically targets the primary.

OK, good idea.

> HTH, looks pretty good at first glance.

Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2010-12-30 20:57:58 Re: pg_streamrecv for 9.1?
Previous Message Jeff Davis 2010-12-30 20:55:22 Re: Old git repo