Re: Exposing the Xact commit order to the user

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Exposing the Xact commit order to the user
Date: 2010-05-26 20:32:53
Message-ID: 4BFD8575.30406@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/26/2010 3:16 PM, Heikki Linnakangas wrote:
> On 26/05/10 21:43, Jan Wieck wrote:
>> On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:
>>> It would not get called during recovery, but I believe that would be
>>> sufficient for Slony. You could always batch commits that you don't
>>> know when they committed as if they committed simultaneously.
>>
>> Here you are mistaken. If the origin crashes but can recover not yet
>> flushed to xlog-commit-order transactions, then the consumer has no idea
>> about the order of those commits, which throws us back to the point
>> where we require a non cacheable global sequence to replay the
>> individual actions of those "now batched" transactions in an agreeable
>> order.
>>
>> The commit order data needs to be covered by crash recovery.
>
> Perhaps I'm missing something,

Apparently, more about that at the end.

> I'm thinking that the commit-order log would contain two kinds of records:
>
> a) Transaction with XID X committed
> b) All transactions with XID < X committed

If that was true then long running transactions would delay all commits
for transactions that started after them. Do they?

>
> During normal operation we write the 1st kind of record at every commit.
> After crash recovery (perhaps at the first commit after recovery or when
> the slon daemon first polls the server, as there's no hook for
> end-of-recovery), we write the 2nd kind of record.

I think the callback is also called during backend startup, which means
that it could record the first XID to come which is known from the
control file and in that case, all < XID's are committed or aborted.

Which leads us to your missing piece above, the need for the global non
cacheable sequence.

Consider two transactions A and B that due to transaction batching
between snapshots get applied together. Let the order of actions be

1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those
individual row actions, then it would not have any idea if within that
batch the action of B (higher XID) actually came first. Without that
knowledge there is a 50/50 chance of getting your replica out of sync
with that simple conflict.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-05-26 20:34:29 Re: Exposing the Xact commit order to the user
Previous Message Dimitri Fontaine 2010-05-26 20:31:04 Re: Synchronization levels in SR