Re: Proposal: Commit timestamp

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Commit timestamp
Date: 2007-01-26 05:41:55
Message-ID: 45B994A3.8090804@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/25/2007 11:41 PM, Bruce Momjian wrote:
> Jan Wieck wrote:
>> On 1/25/2007 6:49 PM, Tom Lane wrote:
>> > Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> >> To provide this data, I would like to add another "log" directory,
>> >> pg_tslog. The files in this directory will be similar to the clog, but
>> >> contain arrays of timestamptz values.
>> >
>> > Why should everybody be made to pay this overhead?
>>
>> It could be made an initdb time option. If you intend to use a product
>> that requires this feature, you will be willing to pay that price.
>
> That is going to cut your usage by like 80%. There must be a better
> way.

I'd love to.

But it is a datum that needs to be collected at the moment where
basically the clog entry is made ... I don't think any external module
can do that ever.

You know how long I've been in and out and back into replication again.
The one thing that pops up again and again in all the scenarios is "what
the heck was the commit order?". Now the pure commit order for a single
node could certainly be recorded from a sequence, but that doesn't cover
the multi-node environment I am after. That's why I want it to be a
timestamp with a few fudged bits at the end. If you look at what I've
described, you will notice that as long as all node priorities are
unique, this timestamp will be a globally unique ID in a somewhat
ascending order along a timeline. That is what replication people are
looking for.

Tom fears that the overhead is significant, which I do understand and
frankly, wonder myself about (actually I don't even have a vague
estimate). I really think we should make this thing an initdb option and
decide later if it's on or off by default. Probably we can implement it
even in a way that one can turn it on/off and a postmaster restart plus
waiting the desired freeze-delay would do.

What I know for certain is that no async replication system can ever do
without the commit timestamp information. Using the transaction start
time or even the single statements timeofday will only lead to
inconsistencies all over the place (I haven't been absent from the
mailing lists for the past couple of month hiding in my closet ... I've
been experimenting and trying to get around all these issues - in my
closet). Slony-I can survive without that information because everything
happens on one node and we record snapshot information for later abusal.
But look at what cost we are dealing with this rather trivial issue. All
we need to know is the serializable commit order. And we have to issue
queries that eventually might exceed address space limits?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2007-01-26 07:19:16 Re: Proposal: Snapshot cloning
Previous Message Bruce Momjian 2007-01-26 04:41:30 Re: Proposal: Commit timestamp