Documenting serializable vs snapshot isolation levels

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Documenting serializable vs snapshot isolation levels
Date: 2008-12-30 00:13:43
Message-ID: 49591357.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm still casting about to make sure I have my head around the issues
adequately to suggest a documentation update. Here's my current
understanding.

The below is intended to help define the nature and scope of the
issue, not be the sort of text that belongs in user documentation.

Assume transactions T0, T1, and TN, where TN can be T1 or a different
transaction, and T0 is concurrent with T1 and TN.
Assume non-overlapping logical sets of data x and y (either or both of
which can be empty or not).

Differences in behavior between what is required by the standard for
serializable transactions and what occurs under snapshot isolation can
manifest when T0 reads x and modifies y based on what was read from x,
T1 modifies x in a way that would affect T0's modification to y, TN
reads y, and T1's modifications are visible to TN, either because it
is the same transaction or because T1 committed before TN got its
snapshot.

I think that transient SELECT anomalies can occur if TN also reads x
without modifying anything based on y. My example of an incomplete
receipt journal is a case of this.

I think that persistent data integrity can be compromised when TN
modifies data based on y. The example currently in the documentation
is a special case of this, where the modified data is part of x.
Another example would be where someone attempted to maintain
referential integrity using snapshot isolation without additional
locks, T1 is TN, and one transaction checked for children before
deleting the parent while the other checked for a parent before
inserting a child. Many other types of integrity enforcement would
fail if serializable behavior is assumed, such as ensuring the
existence of some minimum number of rows which meet certain criteria
(e.g., staffing) or ensuring that the sum of some numeric column for a
range of rows stays above a minimum amount (e.g., banking).

My initial intuition that simply applying the same locks which
guarantee serializable behavior in a non-snapshot database to a
snapshot database would guarantee integrity is just plain wrong.
There are a number of academic papers suggesting techniques for
handling these issues. Unfortunately, what I've seen so far suggests
that there are three main approaches available under PostgreSQL, all
with significant cost of one type or another.

(1) A rigorous analysis of all queries allowed against the database
to determine where transaction conflicts can occur, with expertise
needed to resolve each. Downsides are that ad hoc queries can cause
anomalies and that when new queries are introduced a time-intensive
review of all queries may need to be done, and locking changes may be
needed in programs which weren't otherwise modified.

(2) A less rigorous examination might suffice, if rather brutal
table-level locks are applied mechanically. Even this requires
knowledge of what's happening outside the area where the change is
being made. If the receipt example is solved by adding a table lock
on the receipt table to the code which updates the control record, the
control record update must be modified if some other code modifies
some new table (say, receipt_voids) based on looking at the control
record to determine what to do.

(3) A finer-grained approach would be to make no-effect updates to
rows to lock them if they are to be read for purposes of updating
something else in the transaction. This could have a high cost in
disk access and table bloat. It has the advantage of providing a
simple technique which, if applied consistently, doesn't require
knowledge of software beyond what is under development.

Of course, if the anomalies are infrequent enough and of a nature
which can be tolerated, the whole issue can be ignored, or monitored
for manual correction.

Any suggestions for refining the description of where the anomalies
can occur or how best to work around any particular classes of them
are welcome. I'm particularly interested in practical methodologies
for ensuring that no holes are left for failures of business rules to
apply. There are so many papers on the topic that I'm not sure where
to go next.

I hope someone can show me something good I've missed so far.

I haven't lost track of the language suggested by Robert Haas, which I
think frames the issue nicely. I just want to follow it with a
reasonable description of where it's an issue and how to handle it.

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Chernow 2008-12-30 01:46:04 Re: new libpq SSL connection option
Previous Message Andrew Dunstan 2008-12-29 23:42:00 parallel restore