Re: incoherent view of serializable transactions

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: incoherent view of serializable transactions
Date: 2008-12-23 14:51:03
Message-ID: 4950A677.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> At this point, SERIALIZABLE transactions appear to have worked,
with
>> receipt 3 happening before the update of deposit_date; however,
there
>> was a window of time when the update to deposit_date was visible
and
>> receipt 3 was not.
>
>> This absolutely can't happen in a standard-compliant
implementation.
>
> I think you mean "you'd like to believe that can't happen in a
> standard-compliant implementation".

"The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins."

If you look at the serializable queries in the original post for this
thread, it's not hard to see that this standard is not met. The
insert of receipt 3 appears to happen before the update of the control
record, since it has the old deposit date. The transaction which
selects from both tables sees the update to the control record, so it
must come after that. Yet it doesn't see the results of the first
transaction. There is no sequence of serial execution which is
consistent with the behavior.

Perhaps you are suggesting that it's not possible in a practical sense
for a DBMS to meet this standard? See below.

> It doesn't include any of the
> specific behaviors that are forbidden by the spec, though, so I'm
less
> than convinced.

Following the diagram of specific behaviors allowed at each
level is good evidence that these phenomena don't *define*
serializable:

"NOTE 53 * The exclusion of these phenomena for SQL-transactions
executing at isolation level SERIALIZABLE is a consequence of the
requirement that such transactions be serializable."

> An appropriate way to prevent the problem is probably for the
> transaction that changes the deposit_date to take out a
write-excluding
> lock on the receipts table before it does so.

Well, a serializable transaction operating to standard would probably
take out a write-excluding lock on the control table row when it is
read. This would block the update to the control table until
completion of any pending receipts on the old date. The blocked
update would then take out a read-excluding lock before updating the
row, which would then block receipt transactions looking to check the
date. As soon as the update to the control table is committed, you
can see the new date, and you can have confidence that the old date's
receipts will all show on a SELECT.

Again, I'm not suggesting that we change the behavior of PostgreSQL to
match this, but that we document the difference so that those looking
to switch to PostgreSQL from databases which provide true serializable
transactions know that they have to explicitly lock rows to maintain a
coherent view of the data. In the queries shown in the original post,
the INSERT of the receipts could read the deposit date FOR SHARE to
provide the equivalent functionality, although they would have to be
rejiggered a bit, since that isn't allowed in subqueries.

This isn't some hypothetical "maybe some day some product might
implement this, but it'll never catch on" sort of thing -- Microsoft
and Sybase SQL Server had this from version 1. I used it from 1990
until the conversion to PostgreSQL over the last couple years.
Serializable transactions worked as advertised. Was there more
blocking than PostgreSQL? Yes. Were there more deadlocks than
PostgreSQL? Yes. Did it impact performance? Well, PostgreSQL is
faster by enough that users commented that the applications seemed
"snappier" when we switched the database underneath them from Sybase
to PostgreSQL.

I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.

I'm somewhat mystified at the reaction this topic gets here. :-/

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Tolley 2008-12-23 14:51:51 Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Previous Message Fujii Masao 2008-12-23 14:36:47 Re: Sync Rep: First Thoughts on Code