Re: SSI freezing bug

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Subject: Re: SSI freezing bug
Date: 2013-09-26 09:46:14
Message-ID: 52440266.5040708@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.09.2013 01:07, Hannu Krosing wrote:
> On 09/20/2013 12:55 PM, Heikki Linnakangas wrote:
>> Hi,
>>
>> Prompted by Andres Freund's comments on my Freezing without Write I/O
>> patch, I realized that there's there's an existing bug in the way
>> predicate locking handles freezing (or rather, it doesn't handle it).
>>
>> When a tuple is predicate-locked, the key of the lock is ctid+xmin.
>> However, when a tuple is frozen, its xmin is changed to FrozenXid.
>> That effectively invalidates any predicate lock on the tuple, as
>> checking for a lock on the same tuple later won't find it as the xmin
>> is different.
>>
>> Attached is an isolationtester spec to demonstrate this.
> The case is even fishier than that.
>
> That is, you can get bad behaviour on at least v9.2.4 even without
> VACUUM FREEZE.
>
> You just need to run
>
> permutation "r1" "r2" "w1" "w2" "c1" "c2"
>
> twice in a row.
>
> the first time it does get serialization error at "c2"
> but the 2nd time both "c1" and "c2" complete successfully

Oh, interesting. I did some debugging on this: there are actually *two*
bugs, either one of which alone is enough to cause this on its own:

1. in heap_hot_search_buffer(), the PredicateLockTuple() call is passed
wrong offset number. heapTuple->t_self is set to the tid of the first
tuple in the chain that's visited, not the one actually being read.

2. CheckForSerializableConflictIn() uses the tuple's t_ctid field
instead of t_self to check for exiting predicate locks on the tuple. If
the tuple was updated, but the updater rolled back, t_ctid points to the
aborted dead tuple.

After fixing both of those bugs, running the test case twice in a row
works, ie. causes a conflict and a rollback both times. Anyone see a
problem with this?

That still leaves the original problem I spotted, with freezing; that's
yet another unrelated bug.

- Heikki

Attachment Content-Type Size
ssi-hot-fix-1.patch text/x-diff 1.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2013-09-26 09:53:30 Re: pgbench filler columns
Previous Message Abhijit Menon-Sen 2013-09-26 09:26:08 Re: [PATCH] bitmap indexes