Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

From: MARK CALLAGHAN <mdcallag(at)gmail(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Date: 2011-03-18 16:03:03
Message-ID: AANLkTi=v5n4ODwfUU+Df_BKpk49r_U=FMHtOnYUNPFa5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, Mar 18, 2011 at 2:37 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> Hi,
>
> On 03/18/2011 02:40 PM, Kevin Grittner wrote:
>> Then the only thing you would consider sync replication, as far as I
>> can see, is two phase commit
>
> I think waiting for the ACK before actually making the changes from the
> transaction visible (COMMIT) would suffice for disallowing such an
> inconsistency to manifest.  But obviously, MySQL decided it's not worth
> doing that, as it's such a rare event and a short period of time that
> may show inconsistencies...

There are fewer options for implementing this in MySQL because
replication requires a binlog on the master and that requires the
internal use of XA to keep the binlog and InnoDB in sync as they are
separate resource managers. In theory, this can be changed so that
commit is only forced for the binlog and then on a crash missing
transactions could be copied from the binlog to InnoDB but I don't
think this will ever change.

By "fewer options" I mean that commit in MySQL with InnoDB and the
binlog requires:
1) prepare to InnoDB (force transaction log to disk for changes from
this transaction)
2) write binlog events from this transaction to the binlog
3) write XID event to the binlog (at this point transaction commit is
official, will survive a crash)
4) force binlog to disk
5) release row locks held by transaction in innodb
6) write commit record to innodb transaction log
7) force write of commit record to disk

Group commit is done for the fsyncs from steps 1 and 7. It is not done
for the fsync done in step 4.

Regardless, the processing above is complicated even without
semi-sync. AFAIK, semi-sync code occurs after step 7 but I have not
looked at the official version of semi-sync code in MySQL and my
memory of the work we did at Google is vague.

It is great if Postgres doesn't have this issue. It wasn't clear to me
from lurking on this list. I hope your docs highlight the behavior as
not having the issue is a big deal.

--
Mark Callaghan
mdcallag(at)gmail(dot)com

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Simon Riggs 2011-03-18 16:19:31 Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Previous Message MARK CALLAGHAN 2011-03-18 15:52:18 Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-03-18 16:19:31 Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Previous Message Heikki Linnakangas 2011-03-18 15:58:44 Re: Sync Rep and shutdown Re: Sync Rep v19