Re: serializable lock consistency

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: serializable lock consistency
Date: 2010-12-20 23:08:31
Message-ID: AANLkTiki_JdmgLhqWHtUCA4N92jPSXrCRnOq7NQQ8fEm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 20, 2010 at 5:32 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Dec20, 2010, at 18:54 , Robert Haas wrote:
>> On Mon, Dec 20, 2010 at 12:49 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>>> For me, this is another very good reason to explore this further. Plus, it
>>> improves the ratio of grotty-ness vs. number-of-problems-soved ;-)
>>
>> By all means, look into it further.  I fear the boat is filling up
>> with water, but if you manage to come up with a workable solution I'll
>> be as happy as anyone, promise!
>
> I'll try to create a details proposal. To do that, however, I'll require
> some guidance on whats acceptable and whats not.
>
> Here's a summary of the preceding discussion
>
> To deal with aborted transactions correctly, we need to track the last
> locker of a particular tuple that actually committed. If we also want
> to fix the bug that causes a row lock to be lost upon doing
> lock;savepoint;update;restore that "latest committed locker" will
> sometimes need to be a set, since it'll need to store the outer
> transaction's xid as well as the latest actually committed locker.
>
> As long as no transaction aborts are involved, the tuple's xmax
> contains all the information we need. If a transaction updates,
> deletes or locks a row, the previous xmax is overwritten. If the
> transaction later aborts, we cannot decide whether it has previously
> been locked or not.
>
> And these ideas have come up
>
> A) Transactions who merely lock a row could put the previous
>   locker's xid (if >= GlobalXmin) *and* their own xid into a multi-xid,
>   and store that in xmax. For shared locks, this merely means cleaning
>   out the existing multi-xid a bit less aggressively. There's
>   no risk of bloat there, since we only need to keep one committed
>   xid, not all of them. For exclusive locks, we currently never
>   create a multi-xid. That'd change, we'd need to create one
>   if we find a previous locker with an xid >= GlobalXmin. This doesn't
>   solve the UPDATE and DELETE cases. For SELECT-FOR-SHARE this
>   is probably the best option, since it comes very close to what
>   we do currently.
>
> B) A transaction who UPDATEs or DELETEs a tuple could create an
>   intermediate lock-only tuple which'd contain the necessary
>   information about previous lock holders. We'd only need to do
>   that if there actually is one with xid >= GlobalXmin. We could
>   then choose whether to do the same for SELECT-FOR-UPDATE, or
>   whether we'd prefer to go with (A)
>
> C) The ctid field is only necessary for updated tuples. We could thus
>   overlay it with a field which stores the last committed locker after
>   a DELETE. UPDATEs could be handled either as in (B), or by storing the
>   information in the ctid-overlay in the *new* tuple. SELECT-FOR-UPDATE
>   could again either also use the ctid overlay or use (A).
>
> D) We could add a new tuple header field xlatest. To support binary
>   upgrade, we'd need to be able to read tuples without that field
>   also. We could then either create a new tuple version upon the
>   first lock request to such a tuple (which would then include the
>   new header), or we could simply raise a serialization error if
>   a serializable transaction tried to update a tuple without the
>   field whose xmax was aborted and >= GlobalXmin.
>
> I have the nagging feeling that (D) will meet quite some resistance. (C) was
> too well received either, though I wonder if that'd change if the grotty-ness
> was hidden behind a API, much xvac/cmin/cmax overlay is. (B) seems like a
> lot of overhead, but maybe cleaner. More research is needed though to check
> how it'd interact with HOT and how to get the locking right. (A) is IMHO the
> best solution for the SELECT-FOR-SHARE since it's very close to what we do
> today.
>
> Any comments? Especially of the "don't you dare" kind?

I think any solution based on (D) has zero chance of being accepted,
and it wouldn't be that high except that probabilities can't be
negative. Unfortunately, I don't understand (A) or (B) well enough to
comment intelligently.

My previously expressed concern about (C) wasn't based on ugliness,
but rather on my believe that there is likely a whole lot of code
which relies on the CTID being a self-link when no UPDATE has
occurred. We'd have to be confident that all such cases had been
found and fixed, which might not be easy to be confident about.

I have a more "meta" concern, too. Let's suppose (just for example)
that you write some absolutely brilliant code which makes it possible
to overlay CTID and which has the further advantage of being
stunningly obvious, so that we have absolute confidence that it is
correct. The obvious question is - if we can overlay CTID in some
situations, is there a better use for that than this? Just to throw
out something that might be totally impractical, maybe we could get
rid of XMAX. If the tuple hasn't been updated, the CTID field stores
the XMAX; if it HAS been updated, the CTID points to the successor
tuple, whose XMIN is our XMAX. I'm sure there are a bunch of reasons
why that doesn't actually work - I can think of some of them myself -
but the point is that there's an opportunity cost to stealing those
bits. Once they're dedicated to this purpose, they can't ever be
dedicated to any other purpose. Space in the tuple header is
precious, and I am not at all convinced that we should be willing to
surrender any for this. We have to believe not only that this change
is good, but also that it's more good than some other purpose to which
that bit space could potentially be put.

Now obviously, overlaying onto space that's already reserved is better
than allocating new space. But it's not free.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2010-12-20 23:31:59 Re: Extensions, patch v20 (bitrot fixes)
Previous Message Martijn van Oosterhout 2010-12-20 23:04:56 Re: Extensions, patch v20 (bitrot fixes)