Nested transactions and tuple header info

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Blasby <dblasby(at)refractions(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Nested transactions and tuple header info
Date: 2004-06-01 22:40:07
Message-ID: 200406012240.i51Me7s10128@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
>
> Yes, I did follow it (you mean XMAX_IS_XMIN, right? I suppose no tuple
> can really have Cmin == Cmax). I'm not claiming I understood it fully
> though. But as you see, since the assumption is not valid we have to
> drop the idea and put back the Xmax as a field on its own on
> HeapTupleHeader (which is what I had done before Bruce persuaded me not
> to). I don't really like this idea but I don't see other way out.
>
> A couple of days ago I was going to propose putting Xmax as a separate
> field only as needed, in a way similar to the way Oid is handled ---
> thus we would enlarge the tuple if and only if the creating transaction
> deletes it. This would be nice because one would expect that there are
> not that many tuples created and deleted by the same transaction, so
> we'd localize the inefficiency of storing both fields (Cmin and Xmax)
> only on tuples that need it. While I was writing the proposal I
> realised that it'd mean enlarging tuples that are already on disk, and
> there's no way we can do that.

I have read the archives and I think understand the issue. Before
subtransactions, the only transaction that could see and hence delete a
tuple created by an open transaction was the transaction itself, and to
keep the cmin and cmax, we created a separate tuple bit which indicated
the xmin and xmax were the same.

With subtransactions, other xids (subtransaction ids) can see and delete
tuples created by earlier parts of the main transaction, and the tuple
bit cmin=cmax doesn't work.

So, we need a way to record the xmin and xmax while keeping cmin and
cmax in the tuple header. My idea is for subtransactions to create
additional xid's that represent the opposite of the commit state for
changing tuples created by earlier subtransactions.

BEGIN; xid=1
INSERT a;
BEGIN; xid=2
INSERT b;
DELETE a; xid=3
COMMIT;
COMMIT;

When "DELETE a" happens, we remove the xmin=1 from the tuple header and
replace it with xmin=3. xid=3 will be marked as committed if xid2
aborts, and will be marked as aborted if xid3 commits.

So, if xid2 aborts, the insert of xid1 should be honored, and xid3 is
marked as committed, and the opposite if xid2 commits.

We would have to use pg_subtrans so these phantom xids could point to
the base xid and a list would have to be maintained so higher-level
subtransactions aborting would trigger changes in these phantom xids,
that is, if xid1 aborts, xid2 should abort as well.

Anyway, this is more of a sketch of an possible way to do this without
extending the tuple header for all transactions.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2004-06-01 22:55:39 Re: Nested transactions and tuple header info
Previous Message Tom Lane 2004-06-01 22:07:06 ACLs versus ALTER OWNER