Quick Links

Re: Issues with Quorum Commit

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Issues with Quorum Commit
Date:	2010-10-07 23:44:27
Message-ID:	4CAE5B5B.2090600@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Markus Wanner wrote:
> So far I've been under the impression that Simon already has the code
> for quorum_commit k = 1.
>
> What I'm opposing to is the timeout "feature", which I consider to be
> additional code, unneeded complexity and foot-gun.
>

Additional code? Yes. Foot-gun? Yes. Timeout should be disabled by
default so that you get wait forever unless you ask for something
different? Probably. Unneeded? This is where we don't agree anymore.
The example that Josh Berkus just sent to the list is a typical example
of what I expect people to do here. They'll use Sync Rep to maximize
the odds a system failure doesn't cause any transaction loss. They'll
use good quality hardware on the master so it's unlikely to fail. But
when the database finds the standby unreachable, and it's left with the
choice between either degrading into async rep or coming to a complete
halt, you must give people the option of choosing to degrade instead
after a timeout. Let them set off the red flashing lights, sound the
alarms, and pray the master doesn't go down until you can fix the
problem. But the choice to allow uptime concerns to win over the normal
sync rep preferences, that's a completely valid business decision people
will absolutely want to make in a way opposite of your personal
preference here.

I don't see this as needing any implementation any more complicated than
the usual way such timeouts are handled. Note how long you've been
trying to reach the standby. Default to -1 for forever. And if you hit
the timeout, mark the standby as degraded and force them to do a proper
resync when they disconnect. Once that's done, then they can re-enter
sync rep mode again, via the same process a new node would have done so.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

In response to

Re: Issues with Quorum Commit at 2010-10-07 17:50:40 from Markus Wanner

Responses

Re: Issues with Quorum Commit at 2010-10-08 05:30:03 from Joshua D. Drake
Re: Issues with Quorum Commit at 2010-10-08 05:52:09 from Fujii Masao
Re: Issues with Quorum Commit at 2010-10-08 07:13:55 from Dimitri Fontaine
Re: Issues with Quorum Commit at 2010-10-08 09:02:49 from Markus Wanner
Re: Issues with Quorum Commit at 2010-10-08 14:11:58 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2010-10-08 00:10:52	Re: I: About "Our CLUSTER implementation is pessimal" patch
Previous Message	Josh Kupershmidt	2010-10-07 23:43:20	column-level update privs + lock table