Skip site navigation (1) Skip section navigation (2)

Re: Proposal: Commit timestamp

From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Jim Nasby <decibel(at)decibel(dot)org>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Commit timestamp
Date: 2007-02-03 21:58:27
Message-ID: 4A2CFDE6-4965-42AB-BD2A-9DAA9B25FC36@omniti.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Feb 3, 2007, at 4:38 PM, Jan Wieck wrote:

> On 2/3/2007 4:05 PM, Theo Schlossnagle wrote:
>> On Feb 3, 2007, at 3:52 PM, Jan Wieck wrote:
>>> On 2/1/2007 11:23 PM, Jim Nasby wrote:
>>>> On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote:
>>>>> If a per database configurable tslog_priority is given, the    
>>>>> timestamp will be truncated to milliseconds and the increment   
>>>>> logic  is done on milliseconds. The priority is added to the   
>>>>> timestamp.  This guarantees that no two timestamps for commits   
>>>>> will ever be  exactly identical, even across different servers.
>>>> Wouldn't it be better to just store that information  
>>>> separately,   rather than mucking with the timestamp?
>>>> Though, there's anothe issue here... I don't think NTP is good   
>>>> for  any better than a few milliseconds, even on a local network.
>>>> How exact does the conflict resolution need to be, anyway?  
>>>> Would  it  really be a problem if transaction B committed 0.1  
>>>> seconds  after  transaction A yet the cluster thought it was the  
>>>> other way  around?
>>>
>>> Since the timestamp is basically a Lamport counter which is just   
>>> bumped be the clock as well, it doesn't need to be too precise.
>> Unless I'm missing something, you are _treating_ the counter as a   
>> Lamport timestamp, when in fact it is not and thus does not  
>> provide  semantics of a Lamport timestamp.  As such, any  
>> algorithms that use  lamport timestamps as a basis or assumption  
>> for the proof of their  correctness will not translate (provably)  
>> to this system.
>> How are your counter semantically equivalent to Lamport timestamps?
>
> Yes, you must be missing something.
>
> The last used timestamp is remembered. When a remote transaction is  
> replicated, the remembered timestamp is set to max(remembered,  
> remote). For a local transaction, the remembered timestamp is set  
> to max(remembered+1ms, systemclock) and that value is used as the  
> transaction commit timestamp.

A Lamport clock, IIRC, require a cluster wide tick.  This seems based  
only on activity and is thus an observational tick only which means  
various nodes can have various perspectives at different times.

Given that time skew is prevalent, why is the system clock involved  
at all?

As is usual distributed systems problems, they are very hard to  
explain casually and also hard to review from a theoretical angle  
without a proof.  Are you basing this off a paper?  If so which one?   
If not, have you written a rigorous proof of correctness for this  
approach?

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



In response to

Responses

pgsql-hackers by date

Next:From: Jan WieckDate: 2007-02-03 22:09:00
Subject: Re: Proposal: Commit timestamp
Previous:From: Jan WieckDate: 2007-02-03 21:50:19
Subject: Re: Proposal: Change of pg_trigger.tg_enabled and adding

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group