Re: Issues with Quorum Commit

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-07 23:44:27
Message-ID: 4CAE5B5B.2090600@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Markus Wanner wrote:
> So far I've been under the impression that Simon already has the code
> for quorum_commit k = 1.
>
> What I'm opposing to is the timeout "feature", which I consider to be
> additional code, unneeded complexity and foot-gun.
>

Additional code? Yes. Foot-gun? Yes. Timeout should be disabled by
default so that you get wait forever unless you ask for something
different? Probably. Unneeded? This is where we don't agree anymore.
The example that Josh Berkus just sent to the list is a typical example
of what I expect people to do here. They'll use Sync Rep to maximize
the odds a system failure doesn't cause any transaction loss. They'll
use good quality hardware on the master so it's unlikely to fail. But
when the database finds the standby unreachable, and it's left with the
choice between either degrading into async rep or coming to a complete
halt, you must give people the option of choosing to degrade instead
after a timeout. Let them set off the red flashing lights, sound the
alarms, and pray the master doesn't go down until you can fix the
problem. But the choice to allow uptime concerns to win over the normal
sync rep preferences, that's a completely valid business decision people
will absolutely want to make in a way opposite of your personal
preference here.

I don't see this as needing any implementation any more complicated than
the usual way such timeouts are handled. Note how long you've been
trying to reach the standby. Default to -1 for forever. And if you hit
the timeout, mark the standby as degraded and force them to do a proper
resync when they disconnect. Once that's done, then they can re-enter
sync rep mode again, via the same process a new node would have done so.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-08 00:10:52 Re: I: About "Our CLUSTER implementation is pessimal" patch
Previous Message Josh Kupershmidt 2010-10-07 23:43:20 column-level update privs + lock table