Skip site navigation (1) Skip section navigation (2)

Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

From: MARK CALLAGHAN <mdcallag(at)gmail(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Date: 2011-03-18 16:03:03
Message-ID: AANLkTi=v5n4ODwfUU+Df_BKpk49r_U=FMHtOnYUNPFa5@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-committerspgsql-hackers
On Fri, Mar 18, 2011 at 2:37 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> Hi,
>
> On 03/18/2011 02:40 PM, Kevin Grittner wrote:
>> Then the only thing you would consider sync replication, as far as I
>> can see, is two phase commit
>
> I think waiting for the ACK before actually making the changes from the
> transaction visible (COMMIT) would suffice for disallowing such an
> inconsistency to manifest.  But obviously, MySQL decided it's not worth
> doing that, as it's such a rare event and a short period of time that
> may show inconsistencies...

There are fewer options for implementing this in MySQL because
replication requires a binlog on the master and that requires the
internal use of XA to keep the binlog and InnoDB in sync as they are
separate resource managers. In theory, this can be changed so that
commit is only forced for the binlog and then on a crash missing
transactions could be copied from the binlog to InnoDB but I don't
think this will ever change.

By "fewer options" I mean that commit in MySQL with InnoDB and the
binlog requires:
1) prepare to InnoDB (force transaction log to disk for changes from
this transaction)
2) write binlog events from this transaction to the binlog
3) write XID event to the binlog (at this point transaction commit is
official, will survive a crash)
4) force binlog to disk
5) release row locks held by transaction in innodb
6) write commit record to innodb transaction log
7) force write of commit record to disk

Group commit is done for the fsyncs from steps 1 and 7. It is not done
for the fsync done in step 4.

Regardless, the processing above is complicated even without
semi-sync. AFAIK, semi-sync code occurs after step 7 but I have not
looked at the official version of semi-sync code in MySQL and my
memory of the work we did at Google is vague.

It is great if Postgres doesn't have this issue. It wasn't clear to me
from lurking on this list. I hope your docs highlight the behavior as
not having the issue is a big deal.

-- 
Mark Callaghan
mdcallag(at)gmail(dot)com

In response to

pgsql-hackers by date

Next:From: Simon RiggsDate: 2011-03-18 16:19:31
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Previous:From: Heikki LinnakangasDate: 2011-03-18 15:58:44
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19

pgsql-committers by date

Next:From: Simon RiggsDate: 2011-03-18 16:19:31
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Previous:From: MARK CALLAGHANDate: 2011-03-18 15:52:18
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group