Re: incoherent view of serializable transactions

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: incoherent view of serializable transactions
Date: 2008-12-30 17:28:01
Message-ID: 495A05C1.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Greg Stark wrote:
>> And I don't see why you discard "visibility" as unimportant. All
>> the transaction isolations are defined in terms of the results if
>> the transactions. Those results include both the database state and
>> the data returned by the queries. Otherwise "phantom read" is a
>> meaningless concept.
>
> Basically, if he wants to make a rigid argument that some scenario
> violates the serializability promise, then it is necessary to prove:
>
> (1) There is no serial schedule for the set of transactions that
> achieves the same outcome.
>
> - or -
>
> (2) A phantom read situation occurs.

Agreed, except that (2) is a subset of (1), so (1) alone is
sufficient. (How can a read not be repeatable if there are no
concurrent transactions?)

I feel that I did provide a proof that the transactions couldn't
represent any serial execution in my original email. This seems to be
disputed with the argument that a SELECT from a database is not
required to provide coherent data that represents some point in the
serializable stream of transactions. I disagree, but as I pointed out
previously, it is trivial to capture the results of any such query
into a table and thereby persist the problem within the database.

Here we go. I've labeled the transactions consistently with new
thread I started trying to characterize the full scope of issues and
workarounds.

-- setup
drop if exists ctl, receipt, receipt_totals;
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key,
deposit_date date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);
create table receipt_totals (deposit_date date not null primary key,
next_date date not null, deposit_total numeric(13,2) not null);

-- connection 1 (start of T0)
start transaction isolation level serializable;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2 (T1)
start transaction isolation level serializable;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;

-- connection 2 (TN version 1)
start transaction isolation level serializable;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt where deposit_date = date '2008-12-22';
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit transaction;

-- connection 2 (TN version 2)
start transaction isolation level serializable;
insert into receipt_totals
select r.deposit_date, c.deposit_date, sum(amount)
from ctl c join receipt r
on ( r.deposit_date < c.deposit_date
and not exists
(
select * from receipt r2
where r2.deposit_date < c.deposit_date
and r2.deposit_date > r.deposit_date
)
)
group by r.deposit_date, c.deposit_date;
commit transaction;

-- connection 1 (end of T0)
commit transaction;

After all this is done, a select from receipt_totals shows:

deposit_date | next_date | deposit_total
--------------+------------+---------------
2008-12-22 | 2008-12-23 | 3.00
(1 row)

Here goes the proof, although I'm not going to be overly formal in the
language.

(1) If these transactions were serialized, T0 must come before T1,
because T0 uses the ctl.deposit_date before it is updated by T1.

(2) If these transactions were serialized, T1 must come before TN
(either version), because the TN transactions see the ctl.deposit_date
set by T1.

(3) If these transactions were serialized, the TN transactions must
come before T0, since they don't see the row inserted by T0.

(4) Since serialization requires that T0 < T1 < TN < T0 (comparing
time sequence) the transactions cannot be considered serialized.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-12-30 17:58:15 Re: Documenting serializable vs snapshot isolation levels
Previous Message Bruce Momjian 2008-12-30 17:25:23 Re: about truncate