Re: Proposal: Commit timestamp

From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Jim Nasby <decibel(at)decibel(dot)org>
Subject: Re: Proposal: Commit timestamp
Date: 2007-02-04 19:04:27
Message-ID: E481BDD9-11A3-406B-8B3C-E04DE26A3350@omniti.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Feb 4, 2007, at 1:36 PM, Jan Wieck wrote:

> On 2/4/2007 10:53 AM, Theo Schlossnagle wrote:
>> As the clock must be incremented clusterwide, the need for it to
>> be insync with the system clock (on any or all of the systems)
>> is obviated. In fact, as you can't guarantee the synchronicity
>> means that it can be confusing -- one expects a time-based clock
>> to be accurate to the time. A counter-based clock has no such
>> expectations.
>
> For the fourth time, the clock is in the mix to allow to continue
> during a network outage. All your arguments seem to assume 100%
> network uptime. There will be no clusterwide clock or clusterwide
> increment when you lose connection. How does your idea cope with that?

That's exactly what a quorum algorithm is for.

> Obviously the counters will immediately drift apart based on the
> transaction load of the nodes as soon as the network goes down. And
> in order to avoid this "clock" confusion and wrong expectation,
> you'd rather have a system with such a simple, non-clock based
> counter and accept that it starts behaving totally wonky when the
> cluster reconnects after a network outage? I rather confuse a few
> people than having a last update wins conflict resolution that
> basically rolls dice to determine "last".

If your cluster partition and you have hours of independent action
and upon merge you apply a conflict resolution algorithm that has
enormous effect undoing portions of the last several hours of work on
the nodes, you wouldn't call that "wonky?"

For sane disconnected (or more generally, partitioned) operation in
multi-master environments, a quorum for the dataset must be
established. Now, one can consider the "database" to be the
dataset. So, on network partitions those in "the" quorum are allowed
to progress with data modification and others only read. However,
there is no reason why the dataset _must_ be the database and that
multiple datasets _must_ share the same quorum algorithm. You could
easily classify certain tables or schema or partitions into a
specific dataset and apply a suitable quorum algorithm to that and a
different quorum algorithm to other disjoint data sets.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-04 19:15:18 Re: [PATCHES] Fix "database is ready" race condition
Previous Message Jan Wieck 2007-02-04 18:36:03 Re: Proposal: Commit timestamp