Serializable snapshot isolation patch

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Kevin(dot)Grittner(at)wicourts(dot)gov
Subject: Serializable snapshot isolation patch
Date: 2010-10-18 05:53:45
Message-ID: 1287381225.8516.516.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is based on the Kevin's git repo at:

git://git.postgresql.org/git/users/kgrittn/postgres.git
SHA1: 729541fa5ea94d66e6f4b22fb65bfef92214cd6b

* Trivial stuff:

I get a compiler warning:

indexfsm.c: In function ‘RecordFreeIndexPage’:
indexfsm.c:55: warning: implicit declaration of function
‘PageIsPredicateLocked’

* Open issues, as I see it:

1. 2PC and SSI don't mix (this may be a known issue, because there's not
really any code in the current implementation to deal with 2PC):

Session1:
BEGIN ISOLATION LEVEL SERIALIZABLE;
select count(*) from a;
insert into a values(1);
PREPARE TRANSACTION 't1';

Session2:
BEGIN ISOLATION LEVEL SERIALIZABLE;
select count(*) from a;
insert into a values(1);
COMMIT;

Session1:
COMMIT PREPARED 't1';

Looks like we need to track information about prepared transactions in
shared memory. I think you'll need to keep the information in the 2PC
state file as well, so that it can be rebuilt after a crash or restart.
It all looks solvable at first glance, but it looks like it might be
some work.

2. I think there's a GiST bug (illustrating with PERIOD type):

create table foo(p period);
create index foo_idx on foo using gist (p);
insert into foo select period(
'2009-01-01'::timestamptz + g * '1 microsecond'::interval,
'2009-01-01'::timestamptz + (g+1) * '1 microsecond'::interval)
from generate_series(1,2000000) g;

Session1:
begin isolation level serializable;
select * from foo where p && '[2009-01-01, 2009-01-01]'::period;
insert into foo values('[2009-01-01, 2009-01-01]'::period);

Session2:
begin isolation level serializable;
select * from foo where p && '[2009-01-01, 2009-01-01]'::period;
insert into foo values('[2009-01-01, 2009-01-01]'::period);
commit;

Session1:
commit;

In pg_locks (didn't paste here due to formatting), it looks like the
SIRead locks are holding locks on different pages. Can you clarify your
design for GiST and the interaction with page-level locks? It looks like
you're making some assumption about which pages will be visited when
searching for conflicting values which doesn't hold true. However, that
seems odd, because even if the value is actually inserted in one
transaction, the other doesn't seem to find the conflict. Perhaps the
bug is simpler than that? Or perhaps I have some kind of odd bug in
PERIOD's gist implementation?

Also, it appears to be non-deterministic, to a degree at least, so you
may not observe the problem in the exact way that I do.

3. Limited shared memory space to hold information about committed
transactions that are still "interesting". Relevant thread:

http://archives.postgresql.org/pgsql-hackers/2010-09/msg01735.php

It's a challenging problem, however, and the current solution is less
than ideal. Idle transactions can mean that all new serializable
transactions fail until the idle transactions start to terminate. I
don't like that very much, because people expect to have to retry
serializable transactions, but retrying here has no real hope (except
that some time has elapsed, and maybe the other transaction decided to
commit).

A comparison is made (in the aforementioned thread) to the existing
limitation on the number of locks. However, it's much easier to
understand normal locks, and for a given workload usually you can put an
upper bound on the number of locks required (right?).

Does it make sense to kill the existing transactions that are holding
everything up, rather than the new transaction? Or would that just
confuse matters more? This does not necessarily guarantee that progress
can be made, either, but intuitively it seems more likely.

4. A few final details:

a. We should probably have a compatibility GUC that makes SERIALIZABLE
equal to REPEATABLE READ. My opinion is that this should be only for
compatibility, and should default to "off" (i.e. SSI code enabled)
either in 9.1 or soon after.

b. Docs.

* Questions:

1. For TargetTagIsCoveredBy(), why is it impossible for the covering tag
to have an offset?

2. The explanation for SerializablePredicateLockListLock is a little
confusing. It makes it sound like other processes can't walk the list,
but they can modify it?

* Summary

Great patch! I didn't make it through the patch in as much detail as I
would have liked, because the theory behind it is quite complex and it
will take longer for me to absorb. But the implementation looks good and
the use case is very important.

Regards,
Jeff Davis

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Samuel Gendler 2010-10-18 06:13:01 Re: No hash join across partitioned tables?
Previous Message Greg Smith 2010-10-18 02:15:36 Re: Debugging initdb breakage