On Fri, 29 Nov 2002 13:33:28 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>Unfortunately this discussion is wrong. User-level visibility checks
>will usually have to fetch the parentxid in case 01 as well, because
>even if the parent is committed, it might not be visible in our
Or we don't allow a subtransaction's status to be updated from 11 to
01 until we know, that the main transaction is visible to all active
transactions. Didn't check whether this is expensive to find out. At
least it should be doable by VACCUM.
>Snapshots will record only topmost-parent XIDs (because
>that's what we can find in the PG_PROC array, and anything else would
>create atomicity problems anyway). So we must chase to the topmost
>parent before testing visibility.
BTW, I think this *forces* us to replace the sub xid with the
respective main xid in a tuple header, when we set
XMIN/MAX_IS_COMMITTED. Otherwise we'd have to look for the main xid,
whenever a tuple is touched.
>Also, using a 11 state doubles the amount of pg_clog I/O needed to
>commit a collection of subtransactions.
Is a pg_clog page written out to disk each time a bit is changed? I'd
expect some locality.
>I think it would be preferable to use only three states: active,
>aborted, committed. The parent commit protocol is (1) write 10 as state
>of each aborted subtransaction (this should be done as soon as the
>subtransaction is known aborted, rather than delaying to parent commit);
>(2) write 01 as state of parent (this is the atomic commit); (3) write
>01 as state of each committed subtransaction. Readers who see 00 must
>check the parent state; if the parent is committed then they have to go
>back and recheck the child state (to see if it became "aborted" after
Nice idea! This saves the fourth status for future uses (for example,
Firebird uses it for two phase commit). OTOH for reasons you
mentioned above there's no chance to save parent xid lookups, if we go
>This halves the write traffic during a commit, at the
>cost of additional read traffic when subtransaction state is checked in
>a narrow window after the time of parent transaction commit. I believe
>it nets out to be faster.
Maybe. The whole point of my approach is: If we can limit the active
range of transactions requiring parent xid lookups to a small fraction
of the range needing pg_clog lookups, then it makes sense to store
status bits and parent xids in different files. Otherwise keeping
them together in one file clearly is faster.
In response to
pgsql-hackers by date
|Next:||From: Christopher Kings-Lynne||Date: 2002-11-29 20:25:07|
|Subject: Postgres 7.3 announcement on postgresql.org|
|Previous:||From: Christopher Kings-Lynne||Date: 2002-11-29 19:21:00|
|Subject: eWeek Article|