Re: nested transactions

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nested transactions
Date: 2002-11-28 14:51:53
Message-ID: nv4cuu8sj987ipagso2faoh1soqarj2qm7@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 27 Nov 2002 22:47:33 -0500 (EST), Bruce Momjian
<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>The interesting issue is that if we could set the commit/abort bits all
>at the same time, we could have the parent/child dependency local to the
>backend --- other backends don't need to know the parent, only the
>status of the (subtransaction's) xid, and they need to see all those
>xid's committed at the same time.

You mean the commit/abort bit in the tuple headers? Yes, this would
be interesting, but I see no way how this could be done. If it could,
there would be no need for pg_clog.

Reading your paragraph above one more time I think you mean the bits
in pg_clog: Each subtransaction gets its own xid. On ROLLBACK the
abort bits of the aborted (sub)transaction and all its children are
set in pg_clog immediately. This operation does not have to be
atomic. On subtransaction COMMIT nothing happens to pg_clog, the
status is only changed locally, the subtransaction still looks "in
progress" to other backends. Only when the main transaction commits,
we set the commit bits of the main transaction and all its non-aborted
children in pg_clog. This action has to be atomic. Right?

AFAICS the problem lies in updating several pg_clog bits at once. How
can this be done without holding a potentially long lasting lock?

>You could store the backend slot id in pg_clog rather than the parent
>xid and look up the status of the outer xid for that backend slot. That
>would allow you to use 2 bytes, with a max of 16k backends. The problem
>is that on a crash, the pg_clog points to invalid slots --- it would
>probably have to be cleaned up on startup.

Again I would try to keep pg_clog compact and store the backend slots
in another file, thus not slowing down instances where subtransactions
are nor used. Apart from this minor detail I don't see, how this is
supposed to work. Could you elaborate?

>But still, you have an interesting idea of just setting the bit to be "I
>am a child".

The idea was to set subtransaction bits in the tuple header. Here is
yet another different idea: Let the currently unused fourth state in
pg_clog indicate a committed subtransaction. There are two bits per
transaction, commit and abort, with the following meaning:

a c
0 0 transaction in progress, the owning backend knows whether it is
a main- or a sub-transaction, other backends don't care
1 0 aborted, nobody cares whether main- or sub-transaction
0 1 committed main-transaction (*)
1 1 committed sub-transaction, have to look for parent in
pg_subtrans

If we allow the 1/1 state to be replaced with 0/1 or 1/0 (on the fly
as a side effect of a visibility check, or by vacuum, or by
COMMIT/ROLLBACK), this could save a lot of parent lookups without
having to touch the xids in the tuple headers.

So (*) should read: committed main-transaction or committed
sub-transaction having a committed parent.

>The trick is allowing backends to figure out who's child
>you are. We could store this somehow in shared memory, but that is
>finite and there can be lots of xid's for a backend using
>subtransactions.

The subtrans dependencies have to be visible to all backends. Store
them to disk just like pg_clog. In older proposals I spoke of a
pg_subtrans "table" containing (parent, child) pairs. This was only
meant as a concept, not as a real SQL table subject to MVCC. An
efficient(?) implementation could be an array of parent xids, indexed
by child xid. Most of it can be stolen from the clog code.

One more argument for pg_subtrans being visible to all backends: If
an UPDATE is about to change a tuple touched by another active
transaction, it waits for the other transaction to commit or abort.
We must always wait for the main transaction, not the subtrans.

>I still think there must be a clean way,
I hope so ...

> but I haven't figured it out yet.
Are we getting nearer?

Servus
Manfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Manfred Koizar 2002-11-28 15:34:28 Re: next value expression
Previous Message Tom Lane 2002-11-28 14:48:38 Re: InitDB Failure - PostgreSQL 7.2, RedHat 7.3, compile from source