Re: Issues with Quorum Commit

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-05 22:14:22
Message-ID: 4CABA33E.8020802@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon, Robert,

> The points appear to be directed at "quorum commit", which is a name
> I've used. But most of the points apply more to Fujii's patch than my
> own.

Per previous discussion, I'm trying to get at what reasonable behavior
is, rather than targeting one patch or the other.

> I can only presume that Josh wants to prevent us from adopting a
> design that allows sync against multiple standbys.

Quorum commit == "X servers need to ack for commit", where X > 1.
Usually done as "X out of Y servers must ack", but it's not a given that
the master needs to know how many servers there are, just how many ack'ed.

And I'm not against it; I'm just pointing out that it gives us some
issues which we don't have with a single standby, and thus quorum commit
ought to be treated as a separate feature in 9.1 development.

>> The master can not roll back or cancel the transaction. That's
>> completely infeasible, the WAL record has been written to local disk
>> already. The best it can do is halt and wait for enough standbys to
>> appear to fulfill the quorum. The client will hang waiting for the
>> COMMIT to finish, and the transaction will appear as in-progress to
>> other transactions.
>
> Yes, that point has long been understood. Neither patch does this, and
> in fact the issue is a completely general one.

So, in that case, if it's been 10 minutes, and we're still not getting
ack from standbys, what's the exit strategy for the hapless DBA?
Practically speaking? Without restarting the master?

Last I checked, our goal with synch standby was to increase availablity,
not decrease it. This is, however, not an issue with quorum commit, but
an issue with sync rep in general.

> Could the person that wrote that actually explain what a "specific
> window of synchronicity" is? I'm not sure whether to agree, or disagree.

A specific amount of time within which all nodes will be consistent
regarding that specific transaction.

>> You start a new one from the latest base backup and let it catch up?
>> Possibly modifying the config file in the master to let it know about
>> the new standby, if we go down that path. This part doesn't seem
>> particularly hard to me.
>
> Agreed, not sure of the issue there.

See previous post. The critical phrase is *without restarting the
master*. AFAICT, no patch has addressed the need to change the master's
synch configuration without restarting it. It's possible that I'm not
following something, in which case I'd love to have it pointed out.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-05 22:38:28 Re: knngist - 0.8
Previous Message Simon Riggs 2010-10-05 21:37:29 Re: Issues with Quorum Commit