Re: Serializable Snapshot Isolation

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <gsstark(at)mit(dot)edu>
Cc: <drkp(at)csail(dot)mit(dot)edu>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-25 15:24:27
Message-ID: 4C9DCDDD0200002500035DA2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark wrote:
> Kevin Grittner wrote:

>> One place I'm particularly interested in using such a feature is
>> in pg_dump. Without it we have the choice of using a SERIALIZABLE
>> transaction, which might fail or cause failures (which doesn't
>> seem good for a backup program) or using REPEATABLE READ (to get
>> current snapshot isolation behavior), which might capture a view
>> of the data which contains serialization anomalies.
>
> I'm puzzled how pg_dump could possibly have serialization
> anomalies. Snapshot isolation gives pg_dump a view of the database
> containing all modifications committed before it started and no
> modifications which committed after it started. Since pg_dump makes
> no database modifications itself it can always just be taken to
> occur instantaneously before any transaction which committed after
> it started.

Well, in the SQL-92 standard[1] the definition of serializable
transactions was changed to the following:

| The execution of concurrent SQL-transactions at isolation level
| SERIALIZABLE is guaranteed to be serializable. A serializable
| execution is defined to be an execution of the operations of
| concurrently executing SQL-transactions that produces the same
| effect as some serial execution of those same SQL-transactions. A
| serial execution is one in which each SQL-transaction executes to
| completion before the next SQL-transaction begins.

It hasn't changed since then.

Some people in the community cling to the notion, now obsolete for
almost two decades, that serializable transactions are defined by the
same anomalies which define the other three levels of transaction
isolation. (It's probably time to catch up on that one.) Note that
the above does *not* say "it's OK for a SELECT in a transaction
executing at the serializable isolation level to produce results
which are not consistent with any serial execution of serializable
transactions, as long as the database *eventually* reaches a state
where a repeat of the same SELECT in a new transaction produces
results consistent with such execution." Under the standard, even
read-only transactions have to follow the rules, and I think most
people would want that.

Now, commit order in itself doesn't directly affect the apparent
order of execution. It's only directs the apparent order of
execution to the extent that multiple transactions access the same
data. Read-read "conflicts" don't matter in this scheme, and
write-write conflicts are already handled under snapshot isolation by
preventing both writers from committing -- if one commits, the other
is forced to roll back with a serialization failure. That leaves
write-read and read-write conflicts.

In write-read conflicts, one transaction writes data and then commits
in time for another transaction to see it. This implies that the
writing transaction came first first in the apparent order of
execution. Now for the tricky one: in read-write conflicts (often
written as rw-conflict) the reading transaction cannot see the write
of a concurrent transaction because of snapshot visibility rules.
Since the reading tranaction is unable to see the work of the writing
transaction, it must be considered to have come first in the apparent
order of execution.

In order to have a serialization anomaly under snapshot isolation,
you need a situation like this: a transaction which I'll call T0
(matching much discussion on the topic published in recent years) has
a rw-dependency on a transaction concurrent to T0, which I'll call
T1. In addition, T1 has a rw-dependency on a transaction which is
concurrent to it, which I'll call TN. The reason it's not T2 is that
it can be the same transaction as T0 or a third transaction. So,
because of the rw-conflicts, T0 appears to execute before T1, which
appears to execute before TN. (At this point it should be obvious
why you've got a problem if T0 and TN are the same transaction.)

If T0 and TN are distinct, we still haven't quite met the conditions
required to produce a serialization anomaly, however, The next
requirement is that TN (the third in apparent order of execution)
actually commits first. At this point, the third transaction's
writes are exposed for the world to see, while there are still two
uncommitted tranactions which appear to have committed first. There
are so many ways that this can lead to a cycle in apparent order of
execution, some of which can happen in the client application, that
Serializable Snapshot Isolaiton (SSI) doesn't pretend to track that.

Barring one exception that I worked out myself (although I'd be
shocked if someone didn't beat me to it and I just haven't found a
paper describing it in my researches), the above describes the
conditions under which one of the transactions must be rolled back to
prevent a serialization anomaly.

The exception is interesting here, though. It is that if T0 is a
READ ONLY transaction, it can't participate in an anomaly unless TN
commits before T0 acquires its snapshot. This observation is based
on the fact that since a READ ONLY transaction can't appear to come
after another transaction based on a rw-conflict, I can see only two
ways that it can appear to come after TN and thereby complete the
cycle in the apparent order of execution:
(1) There is a wr-conflict where T0 successfully reads a write from
TN, or
(2) application software outside the database receives confirmation
of the commit of TN before it starts T0.

[If anyone can see a way to complete the cycle in apparent order of
execution when T0 is READ ONLY without having TN commit before T0
acquires its snapshot, please let me know. I'm basing several
optimizations on this, and it is an innovation I have not yet found
mentioned in the literature.]

OK, to get back to the question -- pg_dump's transaction (T0) could
see an inconsistent version of the database if one transaction (TN)
writes to a table, another transaction (T1) overlaps TN and can't
read something written by TN because they are concurrent, TN commits
before T0 acquires its snapshot, T1 writes to a table, T0 starts
before T1 commits, and T0 can't read something which T1 wrote (which
is sort of a given for a database dump and overlapping transactions).

So that's the theory. For a practical example, consider the
receipting example which I've posted to the list multiple times
before. (TN updates a control record with a deposit date, and T1 is
a receipt which uses the deposit date from the control record.) In
this not-atypical accounting system, any date before the current
deposit date is "closed" -- the receipts for each previous date were
set from a control record before it advanced. If pg_dump's
transaction plays the role of T0 here, it could capture the updated
control record, but not a receipt based on the prior value of that
control record. That is, it captures the effects of a *later*
transaction, but not the *earlier* one.

One last thing -- if anyone who has read the Serializable Wiki page
didn't already understand the reason why I had a concern about
pg_dump here, and the above helped clarify it, feel free to draw from
this and update the Wiki page, or let me know what was helpful, and I
will.

-Kevin

[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-09-25 15:34:56 Re: Serializable Snapshot Isolation
Previous Message Tom Lane 2010-09-25 14:45:28 Re: Serializable Snapshot Isolation