Re: serializable lock consistency

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: serializable lock consistency
Date: 2010-12-20 17:49:22
Message-ID: AA2EC79C-CC2C-450A-BF4D-6A037E2D0C08@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Dec20, 2010, at 17:54 , Robert Haas wrote:
> On Mon, Dec 20, 2010 at 9:11 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Dec20, 2010, at 13:13 , Heikki Linnakangas wrote:
>>> One way to look at this is that the problem arises because SELECT FOR UPDATE doesn't create a new tuple like UPDATE does. The problematic case was:
>>>
>>>> T1 locks, T1 commits, T2 updates, T2 aborts, all after T0
>>>> took its snapshot but before T0 attempts to delete. :-(
>>>
>>> If T1 does a regular UPDATE, T2 doesn't overwrite the xmax on the original tuple, but on the tuple that T1 created.
>>
>>> So one way to handle FOR UPDATE would be to lazily turn the lock operation by T1 into a dummy update, when T2 updates the tuple. You can't retroactively make a regular update on behalf of the locking transaction that committed already, or concurrent selects would see the same row twice, but it might work with some kind of a magic tuple that's only followed through the ctid from the original one, and only for the purpose of visibility checks.
>>
>> In the case of an UPDATE of a recently locked tuple, we could avoid having to insert a dummy tuple by storing the old tuple's xmax in the new tuple's xmax. We'd flag the old tuple, and attempt to restore the xmax of any flagged tuple with an aborted xmax and a ctid != t_self during scanning and vacuuming.
>>
>> For DELETEs, that won't work. However, could we maybe abuse the ctid to store the old xmax? It currently contains t_self, but do we actually depend on that?
>
> My first reaction to all of this is that it sounds awfully grotty.
Well, there's precedent for overlaying fields in the tuple header for
space efficiency...

>> FOR-SHARE and FOR-UPDATE locks could preserve information about the latest committed locker by creating a multi-xid. For FOR-SHARE locks, we'd just need to ensure that we only remove all but one finished transactions. For FOR-UPDATE locks, we'd need to create a multi-xid if the old xmax is >= GlobalXmin, but I guess that's tolerable.
>
> Even in the original version of this patch, there's a non-trivial
> overhead here when a multi-xid exists that doesn't exist today: a
> serializable transaction has to grovel through the XIDs in the
> multi-xact and figure out whether any of them are new enough to be a
> problem.
It has to grovel through them anyway to see if any one of them is
still running (in the UPDATE/DELETE/FOR-UPDATE case), or needs to
copy them into a new multi-xid (in the FOR-SHARE case). Scanning
it a second time thus won't cause any further IO. If the second scan
is really a concern here, it could probably be folded into the first,
but I doubt very much that it'd be worth the added complexity.

Having to create a multi-xid for FOR-UPDATE locks is a real concern,
though. But if overlaying/abusing ctid proves to be possible in the case
of DELETEs, the same could be done for FOR-SHARE and FOR-UPDATE locks.
Only the UPDATE case would then remain...

> I fear that this whole approach is a case of trying to jam a
> square peg through a round hole. We're trying to force the on-disk
> format that we have to meet a requirement it wasn't designed for, and
> it's looking pretty ugly.
Certainly true. But most of the grotty-ness isn't inherently part
of a solution to the problem at hand. Rather, it's imposed on us
by the requirement for binary upgradability. IMHO, it's the price
we pay for having that.

> Kevin Grittner's work is a whole different
> approach to this problem, and while that's obviously not fully
> debugged and committed yet either, it's often easier to design a new
> tool to solve a particular problem than to make an existing tool that
> was really meant for something else do some new thing in addition.
Kevin's work only solves the problem at hand if we removed support for
what we call SERIALIZABLE now completely. As it stands, however, what
we now call SERIALIZABLE will still be available as isolation level
REPEATABLE READ, just as it is now, and will retain its quirky behaviour
regarding row-level locks. For some workloads at least, Kevin's also
introduces many more reasons for spurious serialization failures.

Don't take me wrong here - I think what Kevin is doing is very important,
and will be a huge leap for postgres, putting us ahead of all commercial
and open-source competitors that I know of.

But having to pay the price for that to be able to simply enforce RI
constraints slightly outside of what FK constraints can do still seems
wrong. For driving a screw into a piece of wood, a screwdriver still beats
even the most powerful hammer, and leaves fewer traces on the wood...

BTW, I realized that preventing UPDATEs, DELETEs and FOR-UPDATE locks
from removing all traces of previous lock owners would also help to solve the
bug cautioned about here:
http://www.postgresql.org/docs/9.0/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE

Briefly, the problem is if you do
BEGIN;
SELECT * FROM mytable WHERE key = 1 FOR UPDATE;
SAVEPOINT s;
UPDATE mytable SET ... WHERE key = 1;
ROLLBACK TO s;
the row ends up unlocked, instead of locked FOR UPDATE.

While the current behaviour of SERIALIZABLE transactions regarding row locks
may be categorized not as a bug but as a lack of features, this problem is IMHO
clearly a bug. A bug that it's possible to live with, yes, but still a bug.

For me, this is another very good reason to explore this further. Plus, it
improves the ratio of grotty-ness vs. number-of-problems-soved ;-)

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-12-20 17:50:03 Re: unlogged tables
Previous Message Robert Haas 2010-12-20 17:45:26 Re: serializable lock consistency