Re: SIREAD lock versus ACCESS EXCLUSIVE lock

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SIREAD lock versus ACCESS EXCLUSIVE lock
Date: 2011-04-28 14:03:04
Message-ID: 4DB92D48020000250003CFDB@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On Wed, Apr 27, 2011 at 8:59 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> For correct serializable behavior in the face of concurrent DDL
>> execution, I think that a request for a heavyweight ACCESS
>> EXCLUSIVE lock might need to block until all SIREAD locks on the
>> relation have been released. Picture, for example, what might
>> happen if one transaction acquires some predicate locks, then
>> commits (releasing its heavyweight lock on the table), and before
>> concurrent READ WRITE transactions complete there is a CLUSTER on
>> the table. Or a DROP INDEX. :-(
>
> Sorry, I can't picture it. What will happen?

Rather than get into a complex generalized discussion, I'll provide
the simplest example I can picture.

Let's say we have two concurrent transactions, T0 and T1. Up to
this point T0 has read from table x and written to table y based on
what was read from x. T1 has read from y -- but since the
transactions are concurrent, it doesn't see T0's write. Let's
assume each read was of a single tuple accessed through a btree
index, so each transaction has one tuple lock on the heap and one
page lock on the index. Now T0 commits. T0 must hold its SIREAD
locks because of concurrent transaction T1. Everything is fine so
far. Now a DBA runs CLUSTER against table x. The SIREAD locks held
by T0 are probably now wrong, because the tuple and its index entry
are likely to have moved. Now T1 writes to table x based on what it
read from y. It could incorrectly detect a conflict if it happens
to write to a tuple at the locked block and tuple number when it's
not the same row. Worse, it could miss detecting a conflict if it's
really updating the same row that T0 wrote, and that's not detected
because it's not at the locked location any more.

>> If this change is too destabilizing for this point in the release
>> we could document it as a limitation and fix it in 9.2.
>
> I don't think this should wait for 9.2
>
> It either works, or it doesn't. Putting caveats in there will just
> detract from people's belief in it.

I see your point. And this clearly is a bug. We failed to consider
this category of problem and cover it.

Heikki's suggestion is clearly the best plan. In the example above,
when the CLUSTER was run it would make a call to the predicate
locking module telling it to promote all SIREAD locks for table x or
any of its indexes into a relation level lock on table x. The
CLUSTER would cause us to lose the finer granularity of the locks on
the table, and in this example if T1 wrote to table x it be rolled
back with a serialization failure. This could be a false positive,
but we expect to have some of those -- the transaction is retried
and then succeeds. You can't have a false negative, so integrity is
preserved.

I'll try to work up a detailed plan of which commands need what
actions. For example, DROP INDEX needs to promote SIREAD locks on
the dropped index to relation locks on the related table. TRUNCATE
TABLE is a little confusing -- I think that if it's run in a
serializable transaction we generate a rw-conflict out from that
transaction to every transaction holding any SIREAD lock on that
table or any of its indexes, and then clear those SIREAD locks.
This'll take some study.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-04-28 14:03:36 Re: unknown conversion %m
Previous Message Marko Kreen 2011-04-28 13:53:57 Re: Extension Packaging