Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel

From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, hlinnakangas(at)vmware(dot)com
Subject: Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel
Date: 2012-10-11 02:16:39
Message-ID: CAM-w4HM7huYyeZDzwNwb3Doq8hGVXivUcbB-7SZnaowVp13Zgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 11, 2012 at 2:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think I've mentioned it before, but in the interest of not being
>> seen to critique the bikeshed only after it's been painted: this
>> design gives up something very important that exists in our current
>> built-in replication solution, namely pipelining.
>
> Isn't there an even more serious problem, namely that this assumes
> *all* transactions are serializable? What happens when they aren't?
> Or even just that the effective commit order is not XID order?

Firstly, I haven't read the code but I'm confident it doesn't make the
elementary error of assuming commit order == xid order. I assume it's
applying the reassembled transactions in commit order.

I don't think it assumes the transactions are serializable because
it's only concerned with writes, not reads. So the transaction it's
replaying may or may not have been able to view the data written by
other transactions that commited earlier but it doesn't matter when
trying to reproduce the effects using constants. The data written by
this transaction is either written or not when the commit happens and
it's all written or not at that time. Even in non-serializable mode
updates take row locks and nobody can see the data or modify it until
the transaction commits.

I have to say I was curious about Robert's point as well when I read
Peter's review. Especially because this is exactly how other logical
replication systems I've seen work too and I've always wondered about
it in those systems. Both MySQL and Oracle reassemble transactions and
don't write anything until they have the whole transaction
reassembled. To me this always struck me as a bizarre and obviously
bad thing to do though. It seems to me it would be better to create
sessions (or autonomous transactions) for each transaction seen in the
stream and issue the DML as it shows up, committing and cleaning each
up when the commit or abort (or shutdown or startup) record comes
along.

I imagine the reason lies with dealing with locking and ensuring that
you get the correct results without deadlocks when multiple
transactions try to update the same record. But it seems to me that
the original locks the source database took should protect you against
any problems. As long as you can suspend a transaction when it takes a
lock that blocks and keep processing WAL for other transactions (or an
abort for that transaction if that happened due to a deadlock or user
interruption) you should be fine.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-10-11 02:17:10 Re: Warnings from fwrite() in git head
Previous Message Bruce Momjian 2012-10-11 02:12:42 Re: change in LOCK behavior