Re: [PATCHES] Static snapshot data

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Static snapshot data
Date: 2003-05-13 19:57:39
Message-ID: f2e2cvc2vrk82i76tsanqhcjskdc1v9ls3@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 12 May 2003 23:55:31 -0400, Alvaro Herrera
<alvherre(at)dcc(dot)uchile(dot)cl> wrote:
>On Mon, May 12, 2003 at 09:40:37AM -0400, Tom Lane wrote:
>> > Our (Alvaro's and my) current understanding is that snapshots are not
>> > influenced by nested transactions.
>>
>> What was that long article Alvaro posted yesterday, then?

Unfortunately I sent my reply to Tom before I read my inbox :-(

>[...] if the reasoning below is
>correct, we can get away with static Serializable- and QuerySnapshots.
>
>I don't think it makes sense to change the isolation level for a
>non-toplevel transaction. That is, if the topmost transaction is
>ISOLATION LEVEL SERIALIZABLE, all its child transactions will be. And
>if it's not, then there's no way to make its child transactions be so.

I agree.

Tom replied:
|I have a feeling that there might be some value in running a
|SERIALIZABLE subtransaction inside a READ COMMITTED parent.

Never thought of that, most probably due to my notion of nested
transactions (which may be weird). Let's try to sort it out. Here is
my view, it's not much more than three simple rules:

Rule 1) Subtransactions can be nested to arbitrary levels. During
execution of a subtransaction there is no change to the state of the
enclosing transaction.

Rule 2) On subtransaction ROLLBACK, changes done by the
subtransaction are effectively undone.

Rule 3) After subtransaction COMMIT, changes done by the
subtransaction are effectively treated as if done by the enclosing
transaction.

So this sequence of commands

BEGIN; -- main transaction T1
query 1;
BEGIN; -- subtransaction T1.1
query 2;
BEGIN; -- subtransaction T1.1.1
query 3;
ROLLBACK; -- subtransaction T1.1.1
COMMIT; -- subtransaction T1.1
query 4;
BEGIN; -- subtransaction T1.2
query 5;
ROLLBACK; -- subtransaction T1.2
query 6;
COMMIT; -- main transaction T1

effectively behaves like

BEGIN; -- main transaction T1
query 1;
query 2;
query 4;
query 6;
COMMIT; -- main transaction T1

I.e. it does not matter whether query 2 has been issued inside the
(later committed) subtransaction T1.1 or directly in the main
transaction T1. Query 3 and query 5 which are part of aborted
subtransactions (T1.1.1 and T1.2) look as if they had never been
issued.

I'm inclined to include "SET TRANSACTION ISOLATION LEVEL" in the kind
of changes that rules 2 and 3 deal with. Perhaps the rules should say
"commands executed" instead of "changes done". This would forbid
running parts of the same main transaction with different isolation
levels.

>With "constant isolation" in mind, it's clear that the
>SerializableSnapshot is going to be constant for all transactions. We
>don't need to calculate different SerializableSnapshots for child
>transactions; thus going with a static variable for SerializableSnapshot
>isn't wrong.
|
|But ... your definition of the snapshot includes the list of successful
|previous subtransactions of the parent.

Apart from possible (future) performance hacks, I see no need to
include a list of completed subtransactions in any local data
structure, neither Snapshots nor TransactionStates. We do not
enumerate the children of a transaction. We look into the other
direction: When we check visibility we find a transaction id in a
tuple header and want to know its parent transaction id. This
question can be answered by pg_subtrans which will be built on top of
the SimpleLRU patch submitted a few days ago.

| How's that static?

>And about QuerySnapshots: given some running transaction with a given
>QuerySnapshot, a newly created child transaction's first QuerySnapshot
>can be calculated easily as:
>
>- Xmin, Xmax and xip are the same as in the current implementation
> (i.e. the values from GetSnapshotData)

Yes, and this overwrites the current QuerySnapshot.

>- childxact is my parent's childxact
>- parentxact is created by adding my parent XID to my parent's
> parentxact

No need for childxact and parentxact (see below).

|Not entirely sure about that in READ COMMITTED mode. Should a child
|xact be able to see commits from other backends that happened before it
|started, but after its parent started?

Why not? Even if there is no subtransaction, a new *query* sees
commits from other backends. I don't see why query 2 in the right
case should see commits that are invisible to query 2 in the left
case.

BEGIN; BEGIN;
query 1; query 1;
BEGIN; --
query 2; query 2;

|I can think of arguments on both sides ...

??

>And given some non-topmost ending transaction, its parent transaction
>next QuerySnapshot can be calculated as:

A QuerySnapshot is always taken at the start of a query. It does not
depend on the transaction nesting level.

>Thus we don't need to keep track of multiple QuerySnapshots either --
>the new one can always be calculated from the last one.

Not only "from the last one" but "independently from the last one".
GetSnapshotData does not care about subtransactions.

>We need to know all the XIDs that were completed within the same topmost
>transaction, because all of them have to be taken into consideration for
>the visibility rules.

pg_subtrans keeps track of that (sort of, because it can navigate from
child to parent but not vice versa).

> IOW, we have to consider all of them like they
>were only one transaction, discarding the changes made by the ones that
>were aborted.

Okay, cf. rules 2 and 3.

[While we are at it, I continue with some comments to Alvaro's other
message.]

On Sun, 11 May 2003 19:29:27 -0400, Alvaro Herrera
<alvherre(at)dcc(dot)uchile(dot)cl> wrote:
:In the current implementation, it's sufficient to know
:a) what transactions come before me (Xmin),
:b) what transactions come after me (Xmax),
:c) what transactions are in progress (xip), and
:d) what commands come before me in the current transaction
: (curcid)

I propose that we don't change this, except that d) should say "... in
the current transaction tree"

:In the nested transactions case, we also need to know
:
:e) what subtransactions of my own parent transactions come before me,
: and
:f) what commands of my parent transactions come before me.

Yes, we need to have this information. But that doesn't mean we have
to store it in a snapshot.

ad e) I can't see a need to directly answer this question. What we
need is e') Does a given xid belong to the current xact tree?
This can be answered using pg_subtrans and the transaction information
stack (see below).

ad f) I'd write this as:
f') What commands of my transaction tree come before me?

:Consider the following scenario:
:
:BEGIN; xid=1
:CREATE TABLE a (p int UNIQUE, q int); xid=1 cid=1
:INSERT INTO a (p) VALUES (1); xid=1 cid=2
:BEGIN; xid=2
: -- should fail due to unique constraint
: INSERT INTO a (p) VALUES (1); xid=2 cid=1
:ROLLBACK;
:BEGIN; xid=3
: INSERT INTO a (p) VALUES (2); cid=1
: DELETE FROM a WHERE one=1; cid=2
: -- "a" should have 1 tuple
:COMMIT;
:-- should work, because the old tuple doesn't exist anymore
:INSERT INTO a (p) VALUES (1); xid=1 cid=3
:COMMIT;

It might help, if we continue to increment cid across subtransaction
boundaries.

BEGIN; xid=1
CREATE TABLE a (p int UNIQUE, q int); xid=1 cid=1
INSERT INTO a (p) VALUES (1); xid=1 cid=2
BEGIN; xid=2
-- should fail due to unique constraint
INSERT INTO a (p) VALUES (1); xid=2 cid=1 -> 3
ROLLBACK;
BEGIN; xid=3
INSERT INTO a (p) VALUES (2); cid=1 -> 4
DELETE FROM a WHERE one=1; cid=2 -> 5
-- "a" should have 1 tuple
COMMIT;
-- should work, because the old tuple doesn't exist anymore
INSERT INTO a (p) VALUES (1); xid=1 cid=3 -> 6
COMMIT;

:Here, the QuerySnapshot of xid 1, at the time of cid=3 [6] should see the
:results of execution from xid 3, but it is not before Xmin, and it's
:after Xmax, and is not in the xip array.

This will be handled by HeapTupleSatisfiesXxxx using pg_subtrans:

. We find a tuple (having p=2) with xmin=3.

. In pg_clog we find that xact 3 is a committed subtransaction.

. We lookup xact 3's parent transaction in pg_subtrans and get
parent xact = 1.

. Consulting the transaction information stack we find out that
xact 1 is one of our own currently active transactions (in this
case the only one).

. Because the tuple's cmin (4) is less than CurrentCommandId (6)
the tuple is visible.

The snapshot is only consulted for transactions outside our own
transaction tree. This is a natural extension to the current logic,
where we check for IsCurrentTransactionId before we look at the
snapshot.

:Also, the QuerySnapshot of xid 3, should see the results of commands
:from xid 1 just like they'd be seen if they where in the same xact but
:with a lesser CommandId.

Yes, because if we find a tuple with cmin still active, we look for
this xid on our transaction information stack.

:Both cases are not implementable with the current notion of a Snapshot.

I think they are. What we need is not an extension to the snapshot
structure, but a transaction information stack holding transaction
specific information: TransactionId, TransState, TBlockState, ...

This looks almost like struct TransactionStateData, except that
commandId, startTime, and startTimeUsec belong only to the main
transaction.

:I'm not sure what the SerializableSnapshot should be. It does need to
:take into account the changes made by previous committed
:subtransactions, right?

Per rule 3 previous committed subtransactions are equivalent to
previous queries. So whether their effects are visible depends on the
current query snapshot.

Consider this

UPDATE a SET q = 1 WHERE ...; -- xid=1 cid=7

when there is a TRIGGER BEFORE UPDATE FOR EACH ROW containing:

...
BEGIN; -- subtransaction
UPDATE ...;
COMMIT;
...

While the second tuple is processed, visibility rules for the effects
of the trigger executed for the first tuple are the same as if the
trigger had executed its UPDATE without wrapping it into a
subtransaction.

:It's also clear that we need to differentiate a parent's QuerySnapshot
:from their child's.

No, a QuerySnapshot is taken at the start of a query ...

: It's not clear to me what should be done in the
:case of a SerializableSnapshot.

A SerializableSnapshot is the first snapshot taken during a
*transaction tree*.

Servus
Manfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-05-13 20:16:10 Re: How are null's stored?
Previous Message Stephan Szabo 2003-05-13 19:45:27 Re: How are null's stored?

Browse pgsql-patches by date

  From Date Subject
Next Message alex avriette 2003-05-13 22:22:50 Re: patch src/bin/psql/help.c
Previous Message Tom Lane 2003-05-13 17:30:54 Re: GUC and postgresql.conf docs