As previously commented, here's a proposal with patch to turn foreign
key checks into something less intrusive.
The basic idea, as proposed by Simon Riggs, was discussed in a previous
pgsql-hackers thread here:
It goes like this: instead of acquiring a shared lock on the involved
tuple, we only acquire a "key lock", that is, something that prevents
the tuple from going away entirely but not from updating fields that are
not covered by any unique index.
As discussed, this is still more restrictive than necessary (we could
lock only those columns that are involved in the foreign key being
checked), but that has all sorts of implementation level problems, so we
settled for this, which is still much better than the current state of
I published about this here:
So, as a rough design,
1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.
4. It also conflicts with UPDATE if the UPDATE modifies an attribute
indexed by a unique index.
Here's a patch for this, on which I need to do some more testing and
Some patch details:
1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010.
2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the
Xmax to the locker (just like the other lock marks). If the tuple is
already key-locked, a MultiXactId needs to be created from the
original locker(s) and the new transaction.
3. The original tuple needs to be marked with the Cmax of the locking
command, to prevent it from being seen in the same transaction.
4. A non-conflicting update to the tuple must carry forward some fields
from the original tuple into the updated copy. Those include Xmax,
XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.
5. We check for the is-indexed condition early in heap_update. This
check is independent of the HOT check, which occurs later in the
6. The relcache entry now keeps two lists of indexed attributes; the new
one only covers unique indexes. Both lists are built in a single
pass over the index list and saved in the relcache entry, so a
heap_update call only does this once. The main difference between
the two checks is that the one for HOT is done after the tuple has
been toasted. This cannot be done for this check, because the
toaster runs too late. This means some work is duplicated. We
could optimize this further.
Something else that might be of interest: the patch as presented here
does NOT solve the deadlock problem originally presented by Joel
Jacobson. It does solve the second, simpler example I presented in my
blog article referenced above, however. I need to have a closer look at
that problem to figure out if we could fix the deadlock too.
I need to thank Simon Riggs for the original idea, and Robert Haas for
some thoughtful discussion on IM that helped me figure out some
roadblocks. Of course, without the pgsql-hackers discussion there
wouldn't be any patch at all.
I also have to apologize to everyone for the lateness in this. Some
severe illness brought me down, then the holiday season slowed
everything almost to a halt, then a rushed but very much welcome move to
a larger house prevented me from dedicating the time I originally
intended. All those things are settled now, hopefully.
Description: application/octet-stream (66.3 KB)
pgsql-hackers by date
|Next:||From: Josh Berkus||Date: 2011-01-13 21:58:47|
|Subject: Re: [HACKERS] reviewers needed!|
|Previous:||From: Robert Haas||Date: 2011-01-13 21:54:01|
|Subject: Re: reviewers needed!|