Two-phase commit issues

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Two-phase commit issues
Date: 2005-05-18 21:15:09
Message-ID: 25312.1116450909@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've started to look seriously at Heikki's patch for two-phase commit.
There are a few issues that probably deserve discussion:

* The major missing issue that I've come across so far is that
subtransaction and multixact state isn't preserved across a crash.
Assuming that we want to store only top-level XIDs in the shared-memory
list of prepared XIDs (which I think is important), it is essential that
crash restart rebuild the pg_subxact status for prepared transactions.
The subxacts of a prepared xact have to be seen as still running, and
they won't be unless the subxact links are there. Since subxact.c is
designed to wipe all its state on restart, we need to recreate those
entries. Fortunately this doesn't seem hard: the state file for a
prepared xact will include all of its subxact XIDs, and we can just
do SubTransSetParent() on them while rereading the state file. (AFAICS
it's sufficient to make each subxact link directly to the top XID, even
if there was a more complex hierarchy originally.) Similarly, we've got
to reconstruct MultiXactIds that any prepared xacts are members of, else
row-level locks taken out by prepared xacts won't be enforced correctly.
I think this can be handled if we add to the state files a list of all
MultiXactIds that each prepared xact belongs to, and then during restart
forcibly recreate those MultiXactIds. (They would only be rebuilt with
prepared XIDs, not any ordinary XIDs that might originally have been
members.) This seems to require some new code in multixact.c, but not
anything fundamentally difficult --- Alvaro, do you see any likely
problems in this stuff?

* The patch is designed to dump state files into WAL as well as onto
disk. Why? Wouldn't it be better just to write and fsync the state
file before reporting successful prepare? That would get rid of the
need for checkpoint-time fsyncs.

* I'm inclined to think that the "gid" identifiers for prepared
transactions ought to be SQL identifiers (names), not string literals.
Was there a particular reason for making them strings?

* What are we going to do with GUC variables? My feeling is that
the only sane answer is that PREPARE is the same as COMMIT as far as
local GUC variables go, and COMMIT/ROLLBACK PREPARED have no effect
on GUC state. Otherwise it's really unclear what to do. Consider
SET myvar = foo;
BEGIN;
SET myvar = bar;
PREPARE gid;
SHOW myvar; -- what do you see ... foo or bar?
SET myvar = baz; -- is this even legal?
ROLLBACK PREPARED gid;
SHOW myvar; -- now what do you see ... foo or baz?
Since local GUC changes aren't going to be saved/restored across a
crash anyway, I can't see a point in doing anything really complex.

* There are some fairly ugly cases associated with creation and deletion
of temporary tables as well. I think we might want to just decree that
you can't PREPARE a transaction that included creating or dropping a
temp table. Does anyone have much of a problem with that?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2005-05-18 21:19:55 Re: Learning curves and such (was Re: pgFoundry)
Previous Message Alvaro Herrera 2005-05-18 21:10:54 Re: Image storage questions