Re: 2-phase commit

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2-phase commit
Date: 2003-09-26 17:20:43
Message-ID: 200309261720.h8QHKhq10420@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zeugswetter Andreas SB SD wrote:
>
> > > From our previous discussion of 2-phase commit, there was concern that
> > > the failure modes of 2-phase commit were not solvable. However, I think
> > > multi-master replication is going to have similar non-solvable failure
> > > modes, yet people still want multi-master replication.
> >
> > No. The real problem with 2PC in my mind is that its failure modes
> > occur *after* you have promised commit to one or more parties. In
> > multi-master, if you fail you know it before you have told the client
> > his data is committed.
>
> Hmm ? The appl cannot take the first phase commit as its commit info. It
> needs to wait for the second phase commit. The second phase is only finished
> when all coservers have reported back. 2PC is synchronous.
>
> The problems with 2PC are when after second phase commit was sent to all
> servers and before all report back one of them becomes unreachable/down ...
> (did it receive and do the 2nd commit or not) Such a transaction must stay
> open until the coserver is reachable again or an administrator committed/aborted it.
>
> It is multi master replication that usually has an asynchronous mode for
> performance, and there the trouble starts.

Let me diagram this so we can see the issues. Normal operation is:

Master Slave
------ -----
commit ready-->
<--OK
commit done--->
<--OK
completed

One possible failure is:

Master Slave
------ -----
commit ready-->
<--OK
commit done--->
dies here
stuck waiting

Another possible failure is:

Master Slave
------ -----
commit ready-->
<--OK
dies here
stuck waiting

Are these the issues? Can't we just add GUC timeouts to cause the
commit to fail, and the slave to stop waiting? I suppose a problem is:

Master Slave
------ -----
commit ready-->
<--OK
sleep
stuck waiting, times out
commit done

Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-09-26 17:34:28 Re: 2-phase commit
Previous Message Tom Lane 2003-09-26 16:58:54 Re: invalid tid errors in latest 7.3.4 stable.