From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Issues with Quorum Commit |
Date: | 2010-10-08 21:05:09 |
Message-ID: | 1286571909.2304.1026.camel@ebony |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2010-10-08 at 16:34 -0400, Greg Smith wrote:
> Tom Lane wrote:
> > How are you going to "mark the standby as degraded"? The
> > standby can't keep that information, because it's not even connected
> > when the master makes the decision.
>
> From a high level, I'm assuming only that the master has a list in
> memory of the standby system(s) it believes are up to date, and that it
> is supposed to commit to synchronously. When I say mark as degraded, I
> mean that the master merely closes whatever communications channel it
> had open with that system and removes the standby from that list.
My current coding works with two sets of parameters:
The "master marks standby as degraded" is handled by the tcp keepalives.
When it notices no response, it kicks out the standby. We already had
this, so I never mentioned it before as being part of the solution.
The second part is the synchronous_replication_timeout which is a user
settable parameter defining how long the app is prepared to wait, which
could be more or less time than the keepalives.
> If that standby now reconnects again, I don't see how resolving what
> happens at that point is any different from when a standby is first
> started after both systems were turned off. If the standby is current
> with the data available on the master when it has an initial
> conversation, great; it's now available for synchronous commit too
> then. If it's not, it goes into a catchup mode first instead. When the
> master sees you're back to current again, if you're on the list of sync
> servers too you go back onto the list of active sync systems.
>
> There's shouldn't be any state information to save here. If the master
> and standby can't figure out if they are in or out of sync with one
> another based on the conversation they have when they first connect to
> one another, that suggests to me there needs to be improvements made in
> the communications protocol they use to exchange messages.
Agreed.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-10-08 21:44:50 | Re: GIN vs. Partial Indexes |
Previous Message | Simon Riggs | 2010-10-08 20:47:27 | Re: Issues with Quorum Commit |