Re: Proposal: Commit timestamp

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Theo Schlossnagle <jesus(at)omniti(dot)com>, Jim Nasby <decibel(at)decibel(dot)org>
Subject: Re: Proposal: Commit timestamp
Date: 2007-02-04 15:06:27
Message-ID: 45C5F673.2070307@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/4/2007 3:16 AM, Peter Eisentraut wrote:
> Jan Wieck wrote:
>> This is all that is needed for last update wins resolution. And as
>> said before, the only reason the clock is involved in this is so that
>> nodes can continue autonomously when they lose connection without
>> conflict resolution going crazy later on, which it would do if they
>> were simple counters. It doesn't require microsecond synchronized
>> clocks and the system clock isn't just used as a Lamport timestamp.
>
> Earlier you said that "one assumption is that all servers in the
> multimaster cluster are ntp synchronized", which already rung the alarm
> bells in me. Now that I read this you appear to require
> synchronization not on the microsecond level but on some level. I
> think that would be pretty hard to manage for an administrator, seeing
> that NTP typically cannot provide such guarantees.

Synchronization to some degree is wanted to avoid totally unexpected
behavior. The conflict resolution algorithm itself can perfectly fine
live with counters, but I guess you wouldn't want the result of it. If
you update a record on one node, then 10 minutes later you update the
same record on another node. Unfortunately, the nodes had no
communication and because the first node is much busier, its counter is
way advanced ... this would mean the 10 minutes later update would get
lost in the conflict resolution when the nodes reestablish
communication. They would have the same data at the end, just not what
any sane person would expect.

This behavior will kick in whenever the cross node conflicting updates
happen close enough so that the time difference between the clocks can
affect it. So if you update the logical same row on two nodes within a
tenth of a second, and the clocks are more than that apart, the conflict
resolution can result in the older row to survive. Clock synchronization
is simply used to minimize this.

The system clock is used only to keep the counters somewhat synchronized
in the case of connection loss to retain some degree of "last update"
meaning. Without that, continuing autonomously during a network outage
is just not practical.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2007-02-04 15:45:28 Re: [HACKERS] writing new regexp functions
Previous Message Magnus Hagander 2007-02-04 12:02:38 Re: libpq docs about PQfreemem