Re: nested transactions

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nested transactions
Date: 2002-11-28 17:59:21
Message-ID: 200211281759.gASHxL325034@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Manfred Koizar wrote:
> On Wed, 27 Nov 2002 22:47:33 -0500 (EST), Bruce Momjian
> <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> >The interesting issue is that if we could set the commit/abort bits all
> >at the same time, we could have the parent/child dependency local to the
> >backend --- other backends don't need to know the parent, only the
> >status of the (subtransaction's) xid, and they need to see all those
> >xid's committed at the same time.
>
> You mean the commit/abort bit in the tuple headers? Yes, this would
> be interesting, but I see no way how this could be done. If it could,
> there would be no need for pg_clog.
>
> Reading your paragraph above one more time I think you mean the bits
> in pg_clog: Each subtransaction gets its own xid. On ROLLBACK the

Right.

> abort bits of the aborted (sub)transaction and all its children are
> set in pg_clog immediately. This operation does not have to be
> atomic. On subtransaction COMMIT nothing happens to pg_clog, the

Right, going from RUNNING to ABORTED doesn't have to be atomic because
both tuples are invisible.

> status is only changed locally, the subtransaction still looks "in
> progress" to other backends. Only when the main transaction commits,
> we set the commit bits of the main transaction and all its non-aborted
> children in pg_clog. This action has to be atomic. Right?

Right. We can't have some backends looking at part of the transaction
as committed while at the same time other backends see the transaction
as in process.

> AFAICS the problem lies in updating several pg_clog bits at once. How
> can this be done without holding a potentially long lasting lock?

Yes, locking is one possible solution, but no one likes that. One hack
lock idea would be to create a subtransaction-only lock, so if you see
the special 4-th xact state (about to be committed as part of a
subtransaction) you have to wait on that lock (held by the backend
twiddling the xact bits), then look again. That basically would
serialize all the bit-twiddling for subtransactions. I am sure I am
going to get a "yuck" from the audience on that one, but I am not sure
how long that bit twiddling could take. Does xact twiddle every cause
I/O? I think it could, which would be a pretty big performance problem.
It would serialize the subtransaction commits _and_ block anyone trying
to get the status of those subtransactions. We would not use the the
4th xid status during the transaction, only while we were twiddling the
bits on commit.

> >You could store the backend slot id in pg_clog rather than the parent
> >xid and look up the status of the outer xid for that backend slot. That
> >would allow you to use 2 bytes, with a max of 16k backends. The problem
> >is that on a crash, the pg_clog points to invalid slots --- it would
> >probably have to be cleaned up on startup.
>
> Again I would try to keep pg_clog compact and store the backend slots
> in another file, thus not slowing down instances where subtransactions
> are nor used. Apart from this minor detail I don't see, how this is
> supposed to work. Could you elaborate?

The trick is that when that 4th status is set, backends looking up the
status all need to point to a central location that can be set for all
of them at once, hence the original idea of putting the parent xid in
the clog file. We don't _need_ to do that, but we do need a way to
_point_ to a central location where the status can be looked up.

> >But still, you have an interesting idea of just setting the bit to be "I
> >am a child".
>
> The idea was to set subtransaction bits in the tuple header. Here is
> yet another different idea: Let the currently unused fourth state in
> pg_clog indicate a committed subtransaction. There are two bits per
> transaction, commit and abort, with the following meaning:
>
> a c
> 0 0 transaction in progress, the owning backend knows whether it is
> a main- or a sub-transaction, other backends don't care
> 1 0 aborted, nobody cares whether main- or sub-transaction
> 0 1 committed main-transaction (*)
> 1 1 committed sub-transaction, have to look for parent in
> pg_subtrans
>
> If we allow the 1/1 state to be replaced with 0/1 or 1/0 (on the fly
> as a side effect of a visibility check, or by vacuum, or by
> COMMIT/ROLLBACK), this could save a lot of parent lookups without
> having to touch the xids in the tuple headers.

Yes, you could do that, but we can easily just set the clog bits
atomically, and it will not be needed --- the tuple bits really don't
help us, I think.

> So (*) should read: committed main-transaction or committed
> sub-transaction having a committed parent.
>
> >The trick is allowing backends to figure out who's child
> >you are. We could store this somehow in shared memory, but that is
> >finite and there can be lots of xid's for a backend using
> >subtransactions.
>
> The subtrans dependencies have to be visible to all backends. Store
> them to disk just like pg_clog. In older proposals I spoke of a
> pg_subtrans "table" containing (parent, child) pairs. This was only
> meant as a concept, not as a real SQL table subject to MVCC. An
> efficient(?) implementation could be an array of parent xids, indexed
> by child xid. Most of it can be stolen from the clog code.

OK, we put it in a file. And how do we efficiently clean it up?
Remember, it is only to be used for a _brief_ period of time. I think a
file system solution is doable if we can figure out a way not to create
a file for every xid. Do we spin through the files (one per outer
transaction) looking for a matching xid when we see that 4th xact state?

Maybe we write the xid's to a file in a special directory in sorted
order, and backends can do a btree search of each file in that directory
looking for the xid, and then knowing the master xid, look up that
status, and once all the children xid's are updated, you delete the
file.

> One more argument for pg_subtrans being visible to all backends: If
> an UPDATE is about to change a tuple touched by another active
> transaction, it waits for the other transaction to commit or abort.
> We must always wait for the main transaction, not the subtrans.

Yes, but again, the xid status of subtransactions is only update just
before commit of the main transaction, so there is little value to
having those visible.

Let's keep going!

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lee Kindness 2002-11-28 18:03:26 System Tables
Previous Message Evgen Potemkin 2002-11-28 17:34:14 Re: Hirarchical queries a la Oracle. Patch.