SSI rw-conflicts and 2PC

From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: SSI rw-conflicts and 2PC
Date: 2012-02-14 02:57:12
Message-ID: 20120214025712.GQ11222@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Looking over the SSI 2PC code recently, I noticed that I overlooked a
case that could lead to non-serializable behavior after a crash.

When we PREPARE a serializable transaction, we store part of the
SERIALIZABLEXACT in the statefile (in addition to the list of SIREAD
locks). One of the pieces of information we record is whether the
transaction had any conflicts in or out. The problem is that that can
change if a new conflict occurs after the transaction has prepared.

Here's an example of the problem (based on the receipt-report test):

-- Setup
CREATE TABLE ctl (k text NOT NULL PRIMARY KEY, deposit_date date NOT NULL);
INSERT INTO ctl VALUES ('receipt', DATE '2008-12-22');
CREATE TABLE receipt (receipt_no int NOT NULL PRIMARY KEY, deposit_date date NOT NULL, amount numeric(13,2));

-- T2
BEGIN ISOLATION LEVEL SERIALIZABLE;
INSERT INTO receipt VALUES (3, (SELECT deposit_date FROM ctl WHERE k = 'receipt'), 4.00);
PREPARE TRANSACTION 't2';

-- T3
BEGIN ISOLATION LEVEL SERIALIZABLE;
UPDATE ctl SET deposit_date = DATE '2008-12-23' WHERE k = 'receipt';
COMMIT;

-- T1
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM ctl WHERE k = 'receipt';
SELECT * FROM receipt WHERE deposit_date = DATE '2008-12-22';
COMMIT;

Running this sequence of transactions normally, T1 will be rolled back
because of the pattern of conflicts T1 -> T2 -> T3, as we'd expect. This
should still be true even if we restart the database before executing
the last transaction -- but it's not. The problem is that, when T2
prepared, it had no conflicts, so we recorded that in the statefile.
The T2 -> T3 conflict happened later, so we didn't know about it during
recovery.

I discussed this a bit with Kevin and we agreed that this is important
to fix, since it's a false negative that violates serializability. The
question is how to fix it. There are a couple of options...

The easiest answer would be to just treat every prepared transaction
found during recovery as though it had a conflict in and out. This is
roughly a one-line change, and it's certainly safe. But the downside is
that this is pretty restrictive: after recovery, we'd have to abort any
serializable transaction that tries to read anything that a prepared
transaction wrote, or modify anything that it read, until that
transaction is either committed or rolled back.

To do better than that, we want to know accurately whether the prepared
transaction had a conflict with a transaction that prepared or
committed before the crash. We could do this if we had a way to append
a record to the 2PC statefile of an already-prepared transaction --
then we'd just add a new record indicating the conflict. Of course, we
don't have a way to do that. It'd be tricky to add support for this,
since it has to be crash-safe, so the question is whether the improved
precision justifies the complexity it would require.

A third option is to observe that the only conflicts *in* that matter
from a recovered prepared transaction are from other prepared
transactions. So we could have prepared transactions include in their
statefile the xids of any prepared transactions they conflicted with
at prepare time, and match them up during recovery to reconstruct the
graph. This is a middle ground between the other two options. It
doesn't require modifying the statefile after prepare. However, conflicts
*out* to non-prepared transactions do matter, and this doesn't record
those, so we'd have to do the conservative thing -- which means that
after recovery, no one can read anything a prepared transaction wrote.

I thought I'd throw these options out there to see which ones people
think are reasonable (or any better ideas). Of the three, I think the
first (simplest) solution is the only one we could plausibly backpatch
to 9.1. But if the extra aborts after recovery seem too expensive, we
may want to consider one of the other options for future releases.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-14 03:44:17 Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [v9.2] Fix Leaky View Problem)
Previous Message Bruce Momjian 2012-02-14 02:54:06 Re: pg_test_fsync performance