Re: Serializable Isolation without blocking

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Kevin Grittner *EXTERN*" <Kevin(dot)Grittner(at)wicourts(dot)gov>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <mjc(at)it(dot)usyd(dot)edu(dot)au>
Subject: Re: Serializable Isolation without blocking
Date: 2009-05-07 09:51:10
Message-ID: D960CB61B694CF459DCFB4B0128514C202FF65AF@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kevin Grittner wrote:
> While discussing potential changes to PostgreSQL documentation of
> transaction isolation levels, Emmanuel Cecchet pointed out an
> intriguing new paper[1] on a new algorithm to provide true
> serializable behavior in a MVCC based database, with no additional
> blocking; although, there being no such things as a free lunch, there
> is an increase in serialization failures under contention.

I have read through the paper and will share my comments.

I hope I got it right:

To put it in a nutshell, the approach to true serializability presented
in the paper involves "intention locks" which do not block writers,
but cause transactions with conflict potential to be aborted.

The main cost incurred is the maintenance of these intention locks, which
need to be kept for a while even after a transaction has committed.
Moreover, there will potentially be many of these locks (think of
SELECT COUNT(*) FROM ...), so some lock escalation mechanism (to
page or table locks) will be necessary.

I don't know how hard that would be to implement, but I'd argue
that it is only worth considering if it guarantees true serializability.

The paper describes only intention locks for rows in the select list
of a statement and deliberately ignores rows which are examined in
the WHERE clause. The authors argue in section 3.4 that this is no
problem in their implementation since "Berkeley DB does all locking
and versioning on page granularity".

I don't buy that; maybe I misunderstood something.

Consider a function
"makehighlander(personid integer) RETURNS void"
defined like this:

SELECT ishighlander INTO b FROM scots WHERE id=personid;
IF b THEN
RETURN; /* no need to do anything */
END IF;
UPDATE scots SET ishighlander=TRUE WHERE id=personid;
SELECT count(*) INTO n FROM scots WHERE ishighlander;
IF (n > 1) THEN
RAISE EXCEPTION 'There can be only one';
END IF;

If we assume that "ishighlander" is false for all records in
the beginning, and there are two calls to the function with
two personid's of records *in different pages*, then there cannot be
any conflicts since all (write and intention) locks taken by each of
these calls should only affect the one page that contains the one
record that is updated and then found in the subsequent SELECT.

Yet if the two execute concurrently and the two first SELECTs are
executed before the two UPDATEs, then both functions have a snapshot
so that the final SELECT statements will return 1 and both functions will
succeed, leaving the table with two highlanders.

So I think one would have to add intention locks for rows considered
in the WHERE clause to guarantee true serializability.

It would be interesting to know how many lines of code would have
to be added to implement that and how performance would be affected.
I'm afraid that concurrency would suffer because more conflicting
transactions would be aborted.

Another thing I wonder is whether the authors have considered the
possibility that there are serializable and "cursor stability"
transactions at the same time, where the latter would not take
intention locks. Could that affect the considerations about
correctness of the algorithm?

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2009-05-07 09:54:08 Re: ECPG, two varchars with same name on same line
Previous Message Laurent Laborde 2009-05-07 09:25:15 Re: create if not exists (CINE)