Re: SSI freezing bug

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Subject: Re: SSI freezing bug
Date: 2013-10-07 09:26:37
Message-ID: 52527E4D.4060302@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06.10.2013 20:34, Kevin Grittner wrote:
> Note this comment, which I think was written by Heikki back when
> there was a lot more benchmarking and profiling of SSI going on:
>
> * Because a particular target might become obsolete, due to update to a new
> * version, before the reading transaction is obsolete, we need some way to
> * prevent errors from reuse of a tuple ID. Rather than attempting to clean
> * up the targets as the related tuples are pruned or vacuumed, we check the
> * xmin on access. This should be far less costly.
>
> Based on what this patch looks like, I'm afraid he may have been
> right when he wrote that. In any event, after the exercise of
> developing a first draft of searching for predicate locks to clean
> up every time a tuple is pruned or vacuumed, I continue to feel
> strongly that the previously-posted patch, which only takes action
> when tuples are frozen by a vacuum process, is much more suitable
> for backpatching. I don't think we should switch to anything
> resembling the attached without some careful benchmarking.

Hmm, you're probably right. I was thinking that the overhead of scanning
the lock hash on a regular vacuum is negligible, but I didn't consider
pruning. It might be significant there.

I'd like to give this line of investigation some more thought:

> On 04.10.2013 07:14, Dan Ports wrote:
>> On Thu, Oct 03, 2013 at 06:19:49AM -0700, Kevin Grittner wrote:
>>> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> wrote:
>>>> IMHO it would be better to remove xmin from the lock key, and vacuum
>>>> away the old predicate locks when the corresponding tuple is vacuumed.
>>>> The xmin field is only required to handle the case that a tuple is
>>>> vacuumed, and a new unrelated tuple is inserted to the same slot.
>>>> Removing the lock when the tuple is removed fixes that.
>>
>> This seems definitely safe: we need the predicate locks to determine if
>> someone is modifying a tuple we read, and certainly if it's eligible
>> for vacuum nobody's going to be modifying that tuple anymore.
>>
>>>> In fact, I cannot even come up with a situation where you would have a
>>>> problem if we just removed xmin from the key, even if we didn't vacuum
>>>> away old locks. I don't think the old lock can conflict with anything
>>>> that would see the new tuple that gets inserted in the same slot. I have
>>>> a feeling that you could probably prove that if you stare long enough at
>>>> the diagram of a dangerous structure and the properties required for a
>>>> conflict.
>>
>> This would also be safe, in the sense that it's OK to flag a
>> conflict even if one doesn't exist. I'm not convinced that it isn't
>> possible to have false positives this way. I think it's possible for a
>> tuple to be vacuumed away and the ctid reused before the predicate
>> locks on it are eligible for cleanup. (In fact, isn't this what was
>> happening in the thread Kevin linked?)

Time to stare at the dangerous structure again:

> SSI is based on the observation [2] that each snapshot isolation
> anomaly corresponds to a cycle that contains a "dangerous structure"
> of two adjacent rw-conflict edges:
>
> Tin ------> Tpivot ------> Tout
> rw rw
>

Furthermore, Tout must commit first.

The following are true:

* A transaction can only hold a predicate lock on a tuple that's visible
to it.

* A tuple that's visible to Tin or Tpivot cannot be vacuumed away until
Tout commits.

When updating a tuple, CheckTargetForConflictsIn() only marks a conflict
if the transaction holding the predicate lock overlapped with the
updating transaction. And if a tuple is vacuumed away and the slot is
reused, an transaction updating the new tuple cannot overlap with any
transaction holding a lock on the old tuple. Otherwise the tuple
wouldn't have been eligible for vacuuming in the first place.

So I don't think you can ever get a false conflict because of slot
reuse. And if there's a hole in that thinking I can't see right now, the
worst that will happen is some unnecessary conflicts, ie. it's still
correct. It surely can't be worse than upgrading the lock to a
page-level lock, which would also create unnecessary conflicts.

Summary: IMHO we should just remove xmin from the predicate lock tag.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-10-07 11:51:44 old warning in docs
Previous Message Amit Kapila 2013-10-07 06:09:55 Fwd: Patch for reserved connections for replication users