Re: FOR KEY LOCK foreign keys

From: Noah Misch <noah(at)leadboat(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject: Re: FOR KEY LOCK foreign keys
Date: 2011-02-11 07:13:22
Message-ID: 20110211071322.GB26971@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alvaro,

On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote:
> As previously commented, here's a proposal with patch to turn foreign
> key checks into something less intrusive.
>
> The basic idea, as proposed by Simon Riggs, was discussed in a previous
> pgsql-hackers thread here:
> http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
>
> It goes like this: instead of acquiring a shared lock on the involved
> tuple, we only acquire a "key lock", that is, something that prevents
> the tuple from going away entirely but not from updating fields that are
> not covered by any unique index.

First off, this is highly-valuable work. My experience echoes that of some
other commenter (I *think* it was Josh Berkus, but I can't find the original
reference now): this is the #1 cause of production deadlocks. To boot, the
patch is small and fits cleanly into the current code.

The patch had a trivial conflict in planner.c, plus plenty of offsets. I've
attached the rebased patch that I used for review. For anyone following along,
all the interesting hunks touch heapam.c; the rest is largely mechanical. A
"diff -w" patch is also considerably easier to follow.

Incidentally, HeapTupleSatisfiesMVCC has some bits of code like this (not new):

/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);

They're specifically only allowed for SHARE and KEY locks, right?
heap_lock_tuple seems to assume as much.

Having read [1], I tried to work out what kind of table-level lock we must hold
before proceeding with a DDL operation that changes the set of "key" columns.
The thing we must prevent is an UPDATE making a concurrent decision about its
need to conflict with a FOR KEY LOCK lock. Therefore, it's sufficient for the
DDL to take ShareLock. CREATE INDEX does just this, so we're good.

[1] http://archives.postgresql.org/message-id/22196.1282757644@sss.pgh.pa.us

I observe visibility breakage with this test case:

-- Setup
BEGIN;
DROP TABLE IF EXISTS child, parent;
CREATE TABLE parent (
parent_key int PRIMARY KEY,
aux text NOT NULL
);
CREATE TABLE child (
child_key int PRIMARY KEY,
parent_key int NOT NULL REFERENCES parent
);
INSERT INTO parent VALUES (1, 'foo');
COMMIT;
TABLE parent; -- set hint bit
SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0));
to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
--------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
902 | 1 | 8160 | 1 | 32 | 1125 | 0 | 33 | (0,1) | 2 | 2306 | 24 | NULL | NULL

-- Interleaved part
P0:
BEGIN;
INSERT INTO child VALUES (1, 1);
P1:
BEGIN;
SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0));
to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
--------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
112 | 1 | 8160 | 1 | 32 | 1125 | 1126 | 33 | (0,1) | 2 | 274 | 24 | NULL | NULL
UPDATE parent SET aux = 'baz'; -- UPDATE 1
TABLE parent; -- 0 rows
SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0));
to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
--------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
102 | 1 | 8160 | 1 | 32 | 1125 | 1128 | 0 | (0,2) | 16386 | 258 | 24 | NULL | NULL
2012 | 2 | 8128 | 1 | 32 | 1128 | 1126 | 2249 | (0,2) | -32766 | 8210 | 24 | NULL | NULL

The problem seems to be that funny t_cid (2249). Tracing through heap_update,
the new code is not setting t_cid during this test case.

My own deadlock test case, which is fixed by the patch, uses the same setup.
Its interleaved part is as follows:

P0: INSERT INTO child VALUES (1, 1);
P1: INSERT INTO child VALUES (2, 1);
P0: UPDATE parent SET aux = 'bar';
P1: UPDATE parent SET aux = 'baz';

> As discussed, this is still more restrictive than necessary (we could
> lock only those columns that are involved in the foreign key being
> checked), but that has all sorts of implementation level problems, so we
> settled for this, which is still much better than the current state of
> affairs.

Agreed. What about locking only the columns that are actually used in any
incoming foreign key (not just the FK in question at the time)? We'd just have
more work to do on a cold relcache, a pg_depend scan per unique index.

Usually, each of my tables has no more than one candidate key referenced by
FOREIGN KEY constraints: the explicit or notional primary key. I regularly add
UNIQUE indexes not used by any foreign key, though. YMMV. Given this
optimization, constraining the lock even further by individual FOREIGN KEY
constraint would be utterly unimportant for my databases.

> I published about this here:
> http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/
>
> So, as a rough design,
>
> 1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
> 2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
> 3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.

It does not conflict with SELECT FOR SHARE, does it?

> 4. It also conflicts with UPDATE if the UPDATE modifies an attribute
> indexed by a unique index.

This is the per-tuple lock conflict table before your change:

FOR SHARE conflicts with FOR UPDATE
FOR UPDATE conflicts with FOR UPDATE and FOR SHARE

After:

FOR KEY LOCK conflicts with FOR UPDATE
FOR SHARE conflicts with FOR UPDATE
FOR UPDATE conflicts with FOR UPDATE, FOR SHARE, (FOR KEY LOCK if cols <@ keycols)

The odd thing here is the checking of an outside condition to decide whether
locks conflict. Normally, to get a different conflict list, we add another lock
type. What about this?

FOR KEY SHARE conflicts with FOR KEY UPDATE
FOR SHARE conflicts with FOR KEY UPDATE, FOR UPDATE
FOR UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE
FOR KEY UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE, FOR KEY SHARE

This would also fix Joel's test case. A disadvantage is that we'd check for
changes in FK-referenced columns-change even when there's no key lock activity.
That seems acceptable, but it's a point for debate.

Either way, SELECT ... FOR UPDATE will probably end up different than a true
update. The full behavior relies on having an old tuple to bear the UPDATE lock
and a new tuple to bear the KEY lock. In the current patch, SELECT ... FOR
UPDATE blocks on KEY just like SHARE. So there will be that wart in the
conflict lists, no matter what.

> Here's a patch for this, on which I need to do some more testing and
> update docs.
>
> Some patch details:
>
> 1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010.
> 2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the
> Xmax to the locker (just like the other lock marks). If the tuple is
> already key-locked, a MultiXactId needs to be created from the
> original locker(s) and the new transaction.

Makes sense.

> 3. The original tuple needs to be marked with the Cmax of the locking
> command, to prevent it from being seen in the same transaction.

Could you elaborate on this requirement?

> 4. A non-conflicting update to the tuple must carry forward some fields
> from the original tuple into the updated copy. Those include Xmax,
> XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.

HeapTupleHeaderGetCmax() has this assertion:

/* We do not store cmax when locking a tuple */
Assert(!(tup->t_infomask & (HEAP_MOVED | HEAP_IS_LOCKED)));

Assuming that assertion is still valid, there will never be a HEAP_COMBOCID flag
to copy. Right?

> 5. We check for the is-indexed condition early in heap_update. This
> check is independent of the HOT check, which occurs later in the
> routine.
> 6. The relcache entry now keeps two lists of indexed attributes; the new
> one only covers unique indexes. Both lists are built in a single
> pass over the index list and saved in the relcache entry, so a
> heap_update call only does this once. The main difference between
> the two checks is that the one for HOT is done after the tuple has
> been toasted. This cannot be done for this check, because the
> toaster runs too late. This means some work is duplicated. We
> could optimize this further.

Seems reasonable.

> Something else that might be of interest: the patch as presented here
> does NOT solve the deadlock problem originally presented by Joel
> Jacobson. It does solve the second, simpler example I presented in my
> blog article referenced above, however. I need to have a closer look at
> that problem to figure out if we could fix the deadlock too.

One thing that helped me to think through Joel's test case is that the two
middle statements take tuple-level locks, but that's inessential. Granted, FOR
UPDATE tuple locks are by far the most common kind of blocking in production.
Here's another formulation that also still gets a deadlock:

P1: BEGIN;
P2: BEGIN;
P1: UPDATE A SET Col1 = 1 WHERE AID = 1; -- FOR UPDATE tuple lock
P2: LOCK TABLE pg_am IN ROW SHARE MODE
P1: LOCK TABLE pg_am IN ROW SHARE MODE -- blocks
P2: UPDATE B SET Col2 = 1 WHERE BID = 2; -- blocks for KEY => deadlock

As best I can tell, the explanation is that this patch only improves things when
the FOR KEY LOCK precedes the FOR UPDATE. Splitting out FOR KEY UPDATE fixes
that. It would also optimize this complement to your own blog post example,
which still blocks needlessly:

-- Session 1
CREATE TABLE foo (a int PRIMARY KEY, b text);
CREATE TABLE bar (a int NOT NULL REFERENCES foo);
INSERT INTO foo VALUES (42);

BEGIN;
UPDATE foo SET b = 'Hello World' ;

-- Session 2
INSERT INTO bar VALUES (42);

Automated tests would go a long way toward building confidence that this patch
does the right thing. Thanks to the SSI patch, we now have an in-tree test
framework for testing interleaved transactions. The only thing it needs to be
suitable for this work is a way to handle blocked commands. If you like, I can
try to whip something up for that.

Hunk-specific comments (based on diff -w version of patch):

> *** a/src/backend/access/heap/heapam.c
> --- b/src/backend/access/heap/heapam.c

> ***************
> *** 2484,2489 **** l2:
> --- 2487,2508 ----
> xwait = HeapTupleHeaderGetXmax(oldtup.t_data);
> infomask = oldtup.t_data->t_infomask;
>
> + /*
> + * if it's only key-locked and we're not updating an indexed column,
> + * we can act though MayBeUpdated was returned, but the resulting tuple
> + * needs a bunch of fields copied from the original.
> + */
> + if ((infomask & HEAP_XMAX_KEY_LOCK) &&
> + !(infomask & HEAP_XMAX_SHARED_LOCK) &&
> + HeapSatisfiesHOTUpdate(relation, keylck_attrs,
> + &oldtup, newtup))
> + {
> + result = HeapTupleMayBeUpdated;
> + keylocked_update = true;
> + }

The condition for getting here is "result == HeapTupleBeingUpdated && wait". If
!wait, we'd never get the chance to see if this would avoid the wait. Currently
all callers pass wait = true, so this is academic.

> +
> + if (!keylocked_update)
> + {
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>
> /*
> ***************
> *** 2563,2568 **** l2:
> --- 2582,2588 ----
> else
> result = HeapTupleUpdated;
> }
> + }
>
> if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
> {
> ***************
> *** 2609,2621 **** l2:
>
> newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
> newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
> ! newtup->t_data->t_infomask |= (HEAP_XMAX_INVALID | HEAP_UPDATED);
> HeapTupleHeaderSetXmin(newtup->t_data, xid);
> - HeapTupleHeaderSetCmin(newtup->t_data, cid);
> - HeapTupleHeaderSetXmax(newtup->t_data, 0); /* for cleanliness */
> newtup->t_tableOid = RelationGetRelid(relation);
>
> /*
> * Replace cid with a combo cid if necessary. Note that we already put
> * the plain cid into the new tuple.
> */
> --- 2629,2671 ----
>
> newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
> newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
> ! newtup->t_data->t_infomask |= HEAP_UPDATED;
> HeapTupleHeaderSetXmin(newtup->t_data, xid);
> newtup->t_tableOid = RelationGetRelid(relation);
>
> /*
> + * If this update is touching a tuple that was key-locked, we need to
> + * carry forward some bits from the old tuple into the new copy.
> + */
> + if (keylocked_update)
> + {
> + HeapTupleHeaderSetXmax(newtup->t_data,
> + HeapTupleHeaderGetXmax(oldtup.t_data));
> + newtup->t_data->t_infomask |= (oldtup.t_data->t_infomask &
> + (HEAP_XMAX_IS_MULTI |
> + HEAP_XMAX_KEY_LOCK));
> + /*
> + * we also need to copy the combo CID stuff, but only if the original
> + * tuple was created by us; otherwise the combocid module complains
> + * (Alternatively we could use HeapTupleHeaderGetRawCommandId)
> + */

This comment should describe why it's correct, not just indicate that another
module complains if we do otherwise.

> + if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(oldtup.t_data)))
> + {
> + newtup->t_data->t_infomask |= (oldtup.t_data->t_infomask &
> + HEAP_COMBOCID);

HeapTupleHeaderSetCmin unsets HEAP_COMBOCID, so this is a no-op.

> + HeapTupleHeaderSetCmin(newtup->t_data,
> + HeapTupleHeaderGetCmin(oldtup.t_data));
> + }
> +
> + }
> + else
> + {
> + newtup->t_data->t_infomask |= HEAP_XMAX_INVALID;
> + HeapTupleHeaderSetXmax(newtup->t_data, 0); /* for cleanliness */
> + HeapTupleHeaderSetCmin(newtup->t_data, cid);
> + }

As mentioned above, this code can fail to set Cmin entirely.

> +
> + /*
> * Replace cid with a combo cid if necessary. Note that we already put
> * the plain cid into the new tuple.
> */
> ***************
> *** 3142,3148 **** heap_lock_tuple(Relation relation, HeapTuple tuple, Buffer *buffer,
> LOCKMODE tuple_lock_type;
> bool have_tuple_lock = false;
>
> ! tuple_lock_type = (mode == LockTupleShared) ? ShareLock : ExclusiveLock;
>
> *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> --- 3192,3211 ----
> LOCKMODE tuple_lock_type;
> bool have_tuple_lock = false;
>
> ! /* in FOR KEY LOCK mode, we use a share lock temporarily */

I found this comment confusing. The first several times I read it, I thought it
meant that we start out by setting HEAP_XMAX_SHARED_LOCK in the tuple, then
downgrade it. However, this is talking about the ephemeral heavyweight lock.
Maybe it's just me, but consider deleting this comment.

> ! switch (mode)
> ! {
> ! case LockTupleShared:
> ! case LockTupleKeylock:
> ! tuple_lock_type = ShareLock;
> ! break;
> ! case LockTupleExclusive:
> ! tuple_lock_type = ExclusiveLock;
> ! break;
> ! default:
> ! elog(ERROR, "invalid tuple lock mode");
> ! tuple_lock_type = 0; /* keep compiler quiet */
> ! }
>
> *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> ***************
> *** 3175,3192 **** l3:
> LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
>
> /*
> ! * If we wish to acquire share lock, and the tuple is already
> ! * share-locked by a multixact that includes any subtransaction of the
> * current top transaction, then we effectively hold the desired lock
> * already. We *must* succeed without trying to take the tuple lock,
> * else we will deadlock against anyone waiting to acquire exclusive
> * lock. We don't need to make any state changes in this case.
> */
> ! if (mode == LockTupleShared &&
> (infomask & HEAP_XMAX_IS_MULTI) &&
> MultiXactIdIsCurrent((MultiXactId) xwait))
> {
> ! Assert(infomask & HEAP_XMAX_SHARED_LOCK);
> /* Probably can't hold tuple lock here, but may as well check */
> if (have_tuple_lock)
> UnlockTuple(relation, tid, tuple_lock_type);
> --- 3238,3255 ----
> LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
>
> /*
> ! * If we wish to acquire a key or share lock, and the tuple is already
> ! * share- or key-locked by a multixact that includes any subtransaction of the
> * current top transaction, then we effectively hold the desired lock
> * already. We *must* succeed without trying to take the tuple lock,
> * else we will deadlock against anyone waiting to acquire exclusive
> * lock. We don't need to make any state changes in this case.
> */
> ! if ((mode == LockTupleShared || mode == LockTupleKeylock) &&
> (infomask & HEAP_XMAX_IS_MULTI) &&
> MultiXactIdIsCurrent((MultiXactId) xwait))
> {
> ! Assert(infomask & HEAP_IS_SHARE_LOCKED);
> /* Probably can't hold tuple lock here, but may as well check */
> if (have_tuple_lock)
> UnlockTuple(relation, tid, tuple_lock_type);

If we're upgrading from KEY LOCK to a SHARE, we can't take this shortcut. At a
minimum, we need to update t_infomask.

Then there's a choice: do we queue up normally and risk deadlock, or do we skip
the heavyweight lock queue and risk starvation? Your last blog post suggests a
preference for the latter. I haven't formed a strong preference, but given this
behavior, ...

P0: FOR SHARE -- acquired
P1: UPDATE -- blocks
P2: FOR SHARE -- blocks

... I'm not sure why making the first lock FOR KEY LOCK ought to change things.

Some documentation may be in order about the deadlock hazards of mixing FOR
SHARE locks with foreign key usage.

> ***************
> *** 3217,3226 **** l3:
> have_tuple_lock = true;
> }
>
> ! if (mode == LockTupleShared && (infomask & HEAP_XMAX_SHARED_LOCK))
> {
> /*
> ! * Acquiring sharelock when there's at least one sharelocker
> * already. We need not wait for him/them to complete.
> */
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> --- 3280,3290 ----
> have_tuple_lock = true;
> }
>
> ! if ((mode == LockTupleShared || mode == LockTupleKeylock) &&
> ! (infomask & HEAP_IS_SHARE_LOCKED))
> {
> /*
> ! * Acquiring sharelock or keylock when there's at least one such locker
> * already. We need not wait for him/them to complete.
> */
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);

Likewise: we cannot implicitly upgrade someone else's KEY LOCK to SHARE.

> ***************
> *** 3476,3482 **** l3:
> xlrec.target.tid = tuple->t_self;
> xlrec.locking_xid = xid;
> xlrec.xid_is_mxact = ((new_infomask & HEAP_XMAX_IS_MULTI) != 0);
> ! xlrec.shared_lock = (mode == LockTupleShared);
> rdata[0].data = (char *) &xlrec;
> rdata[0].len = SizeOfHeapLock;
> rdata[0].buffer = InvalidBuffer;
> --- 3543,3549 ----
> xlrec.target.tid = tuple->t_self;
> xlrec.locking_xid = xid;
> xlrec.xid_is_mxact = ((new_infomask & HEAP_XMAX_IS_MULTI) != 0);
> ! xlrec.lock_strength = mode == LockTupleShared ? 's' : mode == LockTupleKeylock ? 'k' : 'x';

Seems strange having these character literals. Why not just cast the mode to a
char? Could even set the enum values to the ASCII values of those characters,
if you were so inclined. Happily, they fall in the right order.

> rdata[0].data = (char *) &xlrec;
> rdata[0].len = SizeOfHeapLock;
> rdata[0].buffer = InvalidBuffer;

> *** a/src/backend/executor/execMain.c
> --- b/src/backend/executor/execMain.c

> ***************
> *** 112,119 **** lnext:
> /* okay, try to lock the tuple */
> if (erm->markType == ROW_MARK_EXCLUSIVE)
> lockmode = LockTupleExclusive;
> ! else
> lockmode = LockTupleShared;
>
> test = heap_lock_tuple(erm->relation, &tuple, &buffer,
> &update_ctid, &update_xmax,
> --- 112,126 ----
> /* okay, try to lock the tuple */
> if (erm->markType == ROW_MARK_EXCLUSIVE)
> lockmode = LockTupleExclusive;
> ! else if (erm->markType == ROW_MARK_SHARE)
> lockmode = LockTupleShared;
> + else if (erm->markType == ROW_MARK_KEYLOCK)
> + lockmode = LockTupleKeylock;
> + else
> + {
> + elog(ERROR, "unsupported rowmark type");
> + lockmode = LockTupleExclusive; /* keep compiler quiet */
> + }

A switch statement would be more consistent with what you've done elsewhere.

> *** a/src/backend/nodes/outfuncs.c
> --- b/src/backend/nodes/outfuncs.c

> ***************
> *** 2181,2187 **** _outRowMarkClause(StringInfo str, RowMarkClause *node)
> WRITE_NODE_TYPE("ROWMARKCLAUSE");
>
> WRITE_UINT_FIELD(rti);
> ! WRITE_BOOL_FIELD(forUpdate);
> WRITE_BOOL_FIELD(noWait);
> WRITE_BOOL_FIELD(pushedDown);
> }
> --- 2181,2187 ----
> WRITE_NODE_TYPE("ROWMARKCLAUSE");
>
> WRITE_UINT_FIELD(rti);
> ! WRITE_BOOL_FIELD(strength);

WRITE_ENUM_FIELD?

> WRITE_BOOL_FIELD(noWait);
> WRITE_BOOL_FIELD(pushedDown);
> }
> *** a/src/backend/nodes/readfuncs.c
> --- b/src/backend/nodes/readfuncs.c
> ***************
> *** 299,305 **** _readRowMarkClause(void)
> READ_LOCALS(RowMarkClause);
>
> READ_UINT_FIELD(rti);
> ! READ_BOOL_FIELD(forUpdate);
> READ_BOOL_FIELD(noWait);
> READ_BOOL_FIELD(pushedDown);
>
> --- 299,305 ----
> READ_LOCALS(RowMarkClause);
>
> READ_UINT_FIELD(rti);
> ! READ_BOOL_FIELD(strength);

READ_ENUM_FIELD?

> *** a/src/backend/optimizer/plan/planner.c
> --- b/src/backend/optimizer/plan/planner.c

> ***************
> *** 1887,1896 **** preprocess_rowmarks(PlannerInfo *root)
> newrc = makeNode(PlanRowMark);
> newrc->rti = newrc->prti = rc->rti;
> newrc->rowmarkId = ++(root->glob->lastRowMarkId);
> ! if (rc->forUpdate)
> newrc->markType = ROW_MARK_EXCLUSIVE;
> ! else
> newrc->markType = ROW_MARK_SHARE;
> newrc->noWait = rc->noWait;
> newrc->isParent = false;
>
> --- 1887,1904 ----
> newrc = makeNode(PlanRowMark);
> newrc->rti = newrc->prti = rc->rti;
> newrc->rowmarkId = ++(root->glob->lastRowMarkId);
> ! switch (rc->strength)
> ! {
> ! case LCS_FORUPDATE:
> newrc->markType = ROW_MARK_EXCLUSIVE;
> ! break;
> ! case LCS_FORSHARE:
> newrc->markType = ROW_MARK_SHARE;
> + break;
> + case LCS_FORKEYLOCK:
> + newrc->markType = ROW_MARK_KEYLOCK;
> + break;
> + }

This needs a "default" clause throwing an error. (Seems like the default could
be in #ifdef USE_ASSERT_CHECKING, but we don't seem to ever do that.)

> *** a/src/backend/tcop/utility.c
> --- b/src/backend/tcop/utility.c
> ***************
> *** 2205,2214 **** CreateCommandTag(Node *parsetree)
> else if (stmt->rowMarks != NIL)
> {
> /* not 100% but probably close enough */
> ! if (((RowMarkClause *) linitial(stmt->rowMarks))->forUpdate)
> tag = "SELECT FOR UPDATE";
> ! else
> tag = "SELECT FOR SHARE";
> }
> else
> tag = "SELECT";
> --- 2205,2225 ----
> else if (stmt->rowMarks != NIL)
> {
> /* not 100% but probably close enough */
> ! switch (((RowMarkClause *) linitial(stmt->rowMarks))->strength)
> ! {
> ! case LCS_FORUPDATE:
> tag = "SELECT FOR UPDATE";
> ! break;
> ! case LCS_FORSHARE:
> tag = "SELECT FOR SHARE";
> + break;
> + case LCS_FORKEYLOCK:
> + tag = "SELECT FOR KEY LOCK";
> + break;
> + default:
> + tag = "???";
> + break;

elog(ERROR) in the default clause, perhaps? See earlier comment.

> *** a/src/backend/utils/adt/ruleutils.c
> --- b/src/backend/utils/adt/ruleutils.c
> ***************
> *** 2837,2848 **** get_select_query_def(Query *query, deparse_context *context,
> if (rc->pushedDown)
> continue;
>
> ! if (rc->forUpdate)
> ! appendContextKeyword(context, " FOR UPDATE",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> ! else
> appendContextKeyword(context, " FOR SHARE",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> appendStringInfo(buf, " OF %s",
> quote_identifier(rte->eref->aliasname));
> if (rc->noWait)
> --- 2837,2858 ----
> if (rc->pushedDown)
> continue;
>
> ! switch (rc->strength)
> ! {
> ! case LCS_FORKEYLOCK:
> ! appendContextKeyword(context, " FOR KEY LOCK",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> ! break;
> ! case LCS_FORSHARE:
> appendContextKeyword(context, " FOR SHARE",
> -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> + break;
> + case LCS_FORUPDATE:
> + appendContextKeyword(context, " FOR UPDATE",
> + -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
> + break;
> + }

Another switch statement; see earlier comment.

> *** a/src/backend/utils/cache/relcache.c
> --- b/src/backend/utils/cache/relcache.c

> ***************
> *** 3661,3675 **** RelationGetIndexAttrBitmap(Relation relation)
> --- 3665,3688 ----
> int attrnum = indexInfo->ii_KeyAttrNumbers[i];
>
> if (attrnum != 0)
> + {
> indexattrs = bms_add_member(indexattrs,
> attrnum - FirstLowInvalidHeapAttributeNumber);
> + if (indexInfo->ii_Unique)
> + uindexattrs = bms_add_member(uindexattrs,
> + attrnum - FirstLowInvalidHeapAttributeNumber);
> + }
> }
>
> /* Collect all attributes used in expressions, too */
> pull_varattnos((Node *) indexInfo->ii_Expressions, &indexattrs);
> + if (indexInfo->ii_Unique)
> + pull_varattnos((Node *) indexInfo->ii_Expressions, &uindexattrs);

No need; as Marti mentioned, such indexes are not usable for FOREIGN KEY.

>
> /* Collect all attributes in the index predicate, too */
> pull_varattnos((Node *) indexInfo->ii_Predicate, &indexattrs);
> + if (indexInfo->ii_Unique)
> + pull_varattnos((Node *) indexInfo->ii_Predicate, &uindexattrs);

Likewise.

> *** a/src/include/access/htup.h
> --- b/src/include/access/htup.h
> ***************
> *** 163,174 **** typedef HeapTupleHeaderData *HeapTupleHeader;
> #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
> #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
> #define HEAP_HASOID 0x0008 /* has an object-id field */
> ! /* bit 0x0010 is available */
> #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
> #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
> #define HEAP_XMAX_SHARED_LOCK 0x0080 /* xmax is shared locker */
> /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */
> ! #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK)
> #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
> #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
> #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
> --- 163,177 ----
> #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
> #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
> #define HEAP_HASOID 0x0008 /* has an object-id field */
> ! #define HEAP_XMAX_KEY_LOCK 0x0010 /* xmax is a "key" locker */
> #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
> #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
> #define HEAP_XMAX_SHARED_LOCK 0x0080 /* xmax is shared locker */
> + /* if either SHARE or KEY lock bit is set, this is a "shared" lock */
> + #define HEAP_IS_SHARE_LOCKED (HEAP_XMAX_SHARED_LOCK | HEAP_XMAX_KEY_LOCK)
> /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */

"either" should now be "any".

> ! #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK | \
> ! HEAP_XMAX_KEY_LOCK)
> #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
> #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
> #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */

> *** a/src/include/nodes/parsenodes.h
> --- b/src/include/nodes/parsenodes.h
> ***************
> *** 554,571 **** typedef struct DefElem
> } DefElem;
>
> /*
> ! * LockingClause - raw representation of FOR UPDATE/SHARE options
> *
> * Note: lockedRels == NIL means "all relations in query". Otherwise it
> * is a list of RangeVar nodes. (We use RangeVar mainly because it carries
> * a location field --- currently, parse analysis insists on unqualified
> * names in LockingClause.)
> */
> typedef struct LockingClause
> {
> NodeTag type;
> List *lockedRels; /* FOR UPDATE or FOR SHARE relations */
> ! bool forUpdate; /* true = FOR UPDATE, false = FOR SHARE */
> bool noWait; /* NOWAIT option */
> } LockingClause;
>
> --- 554,579 ----
> } DefElem;
>
> /*
> ! * LockingClause - raw representation of FOR UPDATE/SHARE/KEY LOCK options
> *
> * Note: lockedRels == NIL means "all relations in query". Otherwise it
> * is a list of RangeVar nodes. (We use RangeVar mainly because it carries
> * a location field --- currently, parse analysis insists on unqualified
> * names in LockingClause.)
> */
> + typedef enum LockClauseStrength
> + {
> + /* order is important -- see applyLockingClause */
> + LCS_FORKEYLOCK,
> + LCS_FORSHARE,
> + LCS_FORUPDATE
> + } LockClauseStrength;
> +

It's sure odd having this enum precisely mirror LockTupleMode. Is there
precedent for this? They are at opposite ends of processing stack, I suppose.

> typedef struct LockingClause
> {
> NodeTag type;
> List *lockedRels; /* FOR UPDATE or FOR SHARE relations */
> ! LockClauseStrength strength;
> bool noWait; /* NOWAIT option */
> } LockingClause;
>
> ***************
> *** 839,856 **** typedef struct WindowClause
> * parser output representation of FOR UPDATE/SHARE clauses
> *
> * Query.rowMarks contains a separate RowMarkClause node for each relation
> ! * identified as a FOR UPDATE/SHARE target. If FOR UPDATE/SHARE is applied
> ! * to a subquery, we generate RowMarkClauses for all normal and subquery rels
> ! * in the subquery, but they are marked pushedDown = true to distinguish them
> ! * from clauses that were explicitly written at this query level. Also,
> ! * Query.hasForUpdate tells whether there were explicit FOR UPDATE/SHARE
> ! * clauses in the current query level.
> */
> typedef struct RowMarkClause
> {
> NodeTag type;
> Index rti; /* range table index of target relation */
> ! bool forUpdate; /* true = FOR UPDATE, false = FOR SHARE */
> bool noWait; /* NOWAIT option */
> bool pushedDown; /* pushed down from higher query level? */
> } RowMarkClause;
> --- 847,864 ----
> * parser output representation of FOR UPDATE/SHARE clauses
> *
> * Query.rowMarks contains a separate RowMarkClause node for each relation
> ! * identified as a FOR UPDATE/SHARE/KEY LOCK target. If one of these clauses
> ! * is applied to a subquery, we generate RowMarkClauses for all normal and
> ! * subquery rels in the subquery, but they are marked pushedDown = true to
> ! * distinguish them from clauses that were explicitly written at this query
> ! * level. Also, Query.hasForUpdate tells whether there were explicit FOR
> ! * UPDATE/SHARE clauses in the current query level.

Need a "/KEY LOCK" in the last sentence.

> */
> typedef struct RowMarkClause
> {
> NodeTag type;
> Index rti; /* range table index of target relation */
> ! LockClauseStrength strength;
> bool noWait; /* NOWAIT option */
> bool pushedDown; /* pushed down from higher query level? */
> } RowMarkClause;

I'd like to do some more testing around HOT and TOAST, plus run performance
tests. Figured I should get this much fired off, though.

Thanks,
nm

Attachment Content-Type Size
fklocks-20110211.patch text/plain 66.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2011-02-11 07:43:54 Re: Careful PL/Perl Release Not Required
Previous Message Noah Misch 2011-02-11 07:03:57 Re: Error attribution in foreign scans