Re: Nested transactions

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Nested transactions
Date: 2003-03-18 09:32:58
Message-ID: 3ujd7v87ulj3lh5e94ugor12upvg24s6km@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 18 Mar 2003 00:20:40 -0400, Alvaro Herrera
<alvherre(at)dcc(dot)uchile(dot)cl> wrote:
>We will add a field to TransactionStateData with the xid of the parent
>transaction. If it's set to InvalidTransactionId, then the transaction is a
>parent transaction

We need a stack of currently executing transactions.

>Whenever a transaction aborts, the backend is put into TRANS_ABORT state
>only if it is a toplevel transaction. If it is not, the backend is returned
>to TRANS_INPROGRESS, the TransactionId goes back to the parent
>(sub)transaction Id, and the pg_clog records the transaction as aborted.
>The subtransaction tree may delete the corresponding branch.

First we have to work on the semantics. Immediately aborting the
subtransaction is not consistent. Better stay in the subtransaction
and mark it with a new SUBTRANS_ABORT state (or just TRANS_ABORT
because we know we are in a subtransaction).

On ROLLBACK: end the current subtransaction, pop it from the stack,
restore state of the enclosing transaction.

On COMMIT: end the current subtransaction (marking it as aborted in
pg_clog), pop it from the stack, set the enclosing transaction to
(SUB)TRANS_ABORT.

On BEGIN: start a new subtransaction, mark it as SUBTRANS_ABORT! This
may sound weird, but I think we need it to make nested transactions
useful for scripts and functions.

Possible micro optimisation: don't assign a new xid, but keep track of
nesting level.

Any other command is ignored.

Do we need new convenience commands ROLLBACK ALL and/or COMMIT ALL?

>Commit/abort protocol in pg_clog [...]
>Tuple Visibility [...]

I had the strange feeling that this has already been done in more
detail and was almost going to refer you to
http://archives.postgresql.org/pgsql-hackers/2002-11/msg01124.php,
but checking the link I realised that this one message is truncated
near the beginning. The rest of the thread is ok but the discussion
looks a bit out of context :-(

So I repost that message here, although it doesn't fully reflect my
current opinions. (I'll try to integrate objections and suggestions
into a new proposal and post it tomorrow.)

On Fri, 29 Nov 2002 18:03:56 +0100, I wrote:
|On Thu, 28 Nov 2002 12:59:21 -0500 (EST), Bruce Momjian
|<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
|>Yes, locking is one possible solution, but no one likes that. One hack
|>lock idea would be to create a subtransaction-only lock, [...]
|>
|>> [...] without
|>> having to touch the xids in the tuple headers.
|>
|>Yes, you could do that, but we can easily just set the clog bits
|>atomically,
|
|>From what I read above I don't think we can *easily* set more than one
|transaction's bits atomically.
|
|> and it will not be needed --- the tuple bits really don't
|>help us, I think.
|
|Yes, this is what I said, or at least tried to say. I just wanted to
|make clear how this new approach (use the fourth status) differs from
|older proposals (replace subtransaction ids in tuple headers).
|
|>OK, we put it in a file. And how do we efficiently clean it up?
|>Remember, it is only to be used for a _brief_ period of time. I think a
|>file system solution is doable if we can figure out a way not to create
|>a file for every xid.
|
|I don't want to create one file for every transaction, but rather a
|huge (sparse) array of parent xids. This array is divided into
|manageable chunks, represented by files, "pg_subtrans_NNNN". These
|files are only created when necessary. At any time only a tiny part
|of the whole array is kept in shared buffers. This concept is similar
|or almost equal to pg_clog, which is an array of doublebits.
|
|>Maybe we write the xid's to a file in a special directory in sorted
|>order, and backends can do a btree search of each file in that directory
|>looking for the xid, and then knowing the master xid, look up that
|>status, and once all the children xid's are updated, you delete the
|>file.
|
|Yes, dense arrays or btrees are other possible implementations. But
|for simplicity I'd do it pg_clog style.
|
|>Yes, but again, the xid status of subtransactions is only update just
|>before commit of the main transaction, so there is little value to
|>having those visible.
|
|Having them visible solves the atomicity problem without requiring
|long locks. Updating the status of a single (main or sub) transaction
|is atomic, just like it is now.
|
|Here is what is to be done for some operations:
|
|BEGIN main transaction:
| Get a new xid (no change to current behaviour).
| pg_clog[xid] is still 00, meaning active.
| pg_subtrans[xid] is still 0, meaning no parent.
|
|BEGIN subtransaction:
| Push current transaction info onto local stack.
| Get a new xid.
| Record parent xid in pg_subtrans[xid].
| pg_clog[xid] is still 00.
|
|ROLLBACK subtransaction:
| Set pg_clog[xid] to 10 (aborted).
| Optionally set clog bits for subsubtransactions to 10.
| Pop transaction info from stack.
|
|COMMIT subtransaction:
| Set pg_clog[xid] to 11 (committed subtrans).
| Don't touch clog bits for subsubtransactions!
| Pop transaction info from stack.
|
|ROLLBACK main transaction:
| Set pg_clog[xid] to 10 (aborted).
| Optionally set clog bits for subtransactions to 10.
|
|COMMIT main transaction:
| Set pg_clog[xid] to 01 (committed).
| Optionally set clog bits for subtransactions from 11 to 01.
| Don't touch clog bits for aborted subtransactions!
|
|Visibility check by other transactions: If a tuple is visited and its
|XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
|be consulted to find out the status of the inserting/deleting
|transaction xid. If pg_clog[xid] is ...
|
| 00: transaction still active
|
| 10: aborted
|
| 01: committed
|
| 11: committed subtransaction, have to check parent
|
|Only in this last case do we have to get parentxid from pg_subtrans.
|Now we look at pg_clog[parentxid]. If we find ...
|
| 00: parent still active, so xid is considered active, too
|
| 10: parent aborted, so xid is considered aborted,
| optionally set pg_clog[xid] = 10
|
| 01: parent committed, so xid is considered committed,
| optionally set pg_clog[xid] = 01
|
| 11: recursively check grandparent(s) ...
|
|For brevity the following operations are not covered in detail:
|. Visibility checks for tuples inserted/deleted by a (sub)transaction
|belonging to the current transaction tree (have to check local
|transaction stack whenever we look at a xid or switch to a parent xid)
|. HeapTupleSatisfiesUpdate (sometimes has to wait for parent
|transaction)
|
|The trick here is, that subtransaction status is immediately updated
|in pg_clog on commit/abort. Main transaction commit is atomic (just
|set its commit bit). Status 11 is short-lived, it is replaced with
|the final status by one or more of
|
| - COMMIT/ROLLBACK of the main transaction
| - a later visibility check (as a side effect)
| - VACUUM
|
|pg_subtrans cleanup: A pg_subtrans_NNNN file covers a known range of
|transaction ids. As soon as none of these transactions has a pg_clog
|status of 11, the pg_subtrans_NNNN file can be removed. VACUUM can do
|this, and it won't even have to check the heap.

>I don't know how this will be, but the subsystem will have to answer the
>following questions:
>
>- Xid of my parent transaction

Yes, and not only *my* but *any* transaction. So this has to be
globally visible.

>- List of Xids of non-aborted child transactions

Only if we do "parent commit updates all child states" as suggested by
Tom in
http://archives.postgresql.org/pgsql-hackers/2002-11/msg01129.php.
This can be in a data structure that's private to the transaction.

Servus
Manfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shridhar Daithankar<shridhar_daithankar@persistent.co.in> 2003-03-18 14:22:42 Primary key and references
Previous Message Dave Page 2003-03-18 08:50:51 Re: Roadmap for FE/BE protocol redesign