Re: locks in CREATE TRIGGER, ADD FK

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 15:57:17
Message-ID: 6934.1111593437@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Tom Lane wrote:
>> It isn't 100% MVCC, I agree. But it works because system catalog
>> lookups are SnapshotNow, and so when another session comes and wants to
>> look at the table it will see the committed new version of the pg_class
>> row pointing at the new relfilenode file.

> If by "works", you mean "provides correct transactional semantics", then
> that simply isn't true. Not making CLUSTER and similar DDL commands MVCC
> compliant isn't the end of the world, I agree, but that doesn't make it
> correct, either.

I agree that we aren't MVCC with respect to DDL operations (and for this
purpose CLUSTER is DDL). Trying to become so would open a can of worms
far larger than it's worth, though, IMHO.

Let me give you a couple of illustrations of why things are the way they
are. Consider

Transaction 1 Transaction 2

BEGIN;

... BEGIN;

... INSERT INTO a;

CLUSTER a; ...

Currently, because T2 continues to hold a write lock on A until it
commits, T1's CLUSTER will block until T2 commits; therefore CLUSTER's
use of SnapshotNow is sufficient to not lose any live tuples. Were T1
to use a transaction-aware snapshot to scan A, this would not be so.
(SnapshotAny would work, but CLUSTER would have to become an order of
magnitude harder than it is now, because it would have to preserve
UPDATE chain link relationships ... compare VACUUM FULL's treatment of
tuple chains.)

Transaction 1 Transaction 2

BEGIN;

... CLUSTER a;

INSERT INTO a;

Were T1 using a transaction-aware snapshot to read pg_class, it would
insert its new tuple into the wrong relfilenode for A, causing either
immediate failure or eventual loss of a live tuple.

Transaction 1 Transaction 2

BEGIN;

... CREATE TRIGGER ... ON INSERT TO a ...

INSERT INTO a;

Were T1 using a transaction-aware snapshot to read pg_trigger, it would
fail to fire the already-committed insert trigger. (Variants of this
include failing to insert index entries into a new index, failing to
apply a new constraint, etc.) You don't have to assume that T1 is
in Serializable mode, either. It might be using Read Committed, but
the INSERT starts and sets its snapshot while T2 is still in progress;
then of course blocks until T2 commits; then does the wrong thing
because it is still paying attention to pre-T2 catalog entries. This is
why LockRelation accepts SI inval messages immediately after each lock
acquisition; it's to ensure that we do see the effects of T2.

Transaction 1 Transaction 2

BEGIN;

SELECT FROM a;

... CREATE TRIGGER ... ON INSERT TO a ...

INSERT INTO a;

Ordinarily, once T1 has touched relation A, it can be sure that A's
schema will not change until end of transaction. The change you
committed last night removes that guarantee, at least for the
limited case of triggers, and makes the above interleaving possible.
While I haven't come up with a clear failure case after a few minutes'
thought, I'm not convinced that there isn't one.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-03-23 16:13:59 Re: \x in psql
Previous Message Tom Lane 2005-03-23 15:04:43 Re: locks in CREATE TRIGGER, ADD FK