Re: Issues with Quorum Commit

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-08 21:05:09
Message-ID: 1286571909.2304.1026.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2010-10-08 at 16:34 -0400, Greg Smith wrote:
> Tom Lane wrote:
> > How are you going to "mark the standby as degraded"? The
> > standby can't keep that information, because it's not even connected
> > when the master makes the decision.
>
> From a high level, I'm assuming only that the master has a list in
> memory of the standby system(s) it believes are up to date, and that it
> is supposed to commit to synchronously. When I say mark as degraded, I
> mean that the master merely closes whatever communications channel it
> had open with that system and removes the standby from that list.

My current coding works with two sets of parameters:

The "master marks standby as degraded" is handled by the tcp keepalives.
When it notices no response, it kicks out the standby. We already had
this, so I never mentioned it before as being part of the solution.

The second part is the synchronous_replication_timeout which is a user
settable parameter defining how long the app is prepared to wait, which
could be more or less time than the keepalives.

> If that standby now reconnects again, I don't see how resolving what
> happens at that point is any different from when a standby is first
> started after both systems were turned off. If the standby is current
> with the data available on the master when it has an initial
> conversation, great; it's now available for synchronous commit too
> then. If it's not, it goes into a catchup mode first instead. When the
> master sees you're back to current again, if you're on the list of sync
> servers too you go back onto the list of active sync systems.
>
> There's shouldn't be any state information to save here. If the master
> and standby can't figure out if they are in or out of sync with one
> another based on the conversation they have when they first connect to
> one another, that suggests to me there needs to be improvements made in
> the communications protocol they use to exchange messages.

Agreed.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-08 21:44:50 Re: GIN vs. Partial Indexes
Previous Message Simon Riggs 2010-10-08 20:47:27 Re: Issues with Quorum Commit