|From:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|To:||Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>|
|Cc:||Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>|
|Subject:||Re: Remove HeapTuple and Buffer dependency for predicate locking functions|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 06/08/2019 07:20, Thomas Munro wrote:
> On Tue, Aug 6, 2019 at 4:35 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2019-08-05 20:58:05 +1200, Thomas Munro wrote:
>>> 1. Commit dafaa3efb75 "Implement genuine serializable isolation
>>> level." (2011) locked the root tuple, because it set t_self to *tid.
>>> Possibly due to confusion about the effect of the preceding line
>>> ItemPointerSetOffsetNumber(tid, offnum).
>>> 2. Commit commit 81fbbfe3352 "Fix bugs in SSI tuple locking." (2013)
>>> fixed that by adding ItemPointerSetOffsetNumber(&heapTuple->t_self,
>> Hm. It's not at all sure that it's ok to report the non-root tuple tid
>> here. I.e. I'm fairly sure there was a reason to not just set it to the
>> actual tid. I think I might have written that up on the list at some
>> point. Let me dig in memory and list. Obviously possible that that was
>> also obsoleted by parallel changes.
> Adding Heikki and Kevin.
> I haven't found your earlier discussion about that yet, but would be
> keen to read it if you can find it. I wondered if your argument
> might have had something to do with heap pruning, but I can't find a
> problem there. It's not as though the TID of any visible tuples
> change, it's just that some dead stuff goes away and the root line
> pointer is changed to LP_REDIRECT if it wasn't already.
> As for the argument for locking the tuple we emit rather than the HOT
> root, I think the questions are: (1) How exactly do we get away with
> locking only one version in a regular non-HOT update chain? (2) Is it
> OK to do that with a HOT root?
> The answer to the first question is given in README-SSI.
> Unfortunately it doesn't discuss HOT directly, but I suspect the
> answer is no, HOT is not special here. By my reading, it relies on
> the version you lock being the version visible to your snapshot, which
> is important because later updates have to touch that tuple to write
> the next version. That doesn't apply to some arbitrarily older tuple
> that happens to be a HOT root. Concretely, heap_update() does
> CheckForSerializableConflictIn(relation, &oldtup, buffer), which is
> only going to produce a rw conflict if T1 took an SIREAD on precisely
> the version T2 locks in that path, not some arbitrarily older version
> that happens to be a HOT root. A HOT root might never be considered
> again by concurrent writers, no?
Your analysis is spot on. Thanks for the clear write-up!
>>> This must be in want of an isolation test, but I haven't yet tried to
>>> get my head around how to write a test that would show the difference.
> One practical problem is that the only way to reach
> PredicateLockTuple() is from an index scan, and the index scan locks
> the index page (or the whole index, depending on
> rd_indam->ampredlocks). So I think if you want to see a serialization
> anomaly you'll need multiple indexes (so that index page locks don't
> hide the problem), a long enough HOT chain and then probably several
> transactions to be able to miss a cycle that should be picked up by
> the logic in . I'm out of steam for this problem today though.
I had some steam, and wrote a spec that reproduces this bug. It wasn't
actually that hard to reproduce, fortunately. Or unfortunately; people
might well be hitting it in production. I used the "freezetest.spec"
from the 2013 thread as the starting point, and added one UPDATE to the
initialization, so that the test starts with an already HOT-updated
tuple. It should throw a serialization error, but on current master, it
does not. After applying your fix.txt, it does.
Your fix.txt seems correct. For clarity, I'd prefer moving things around
a bit, though, so that the t_self is set earlier in the function, at the
same place where the other HeapTuple fields are set. And set blkno and
offnum together, in one ItemPointerSet call. With that, I'm not sure we
need such a verbose comment explaining why t_self needs to be updated
but I kept it for now.
Attached is a patch that contains your fix.txt with the changes for
clarity mentioned above, and an isolationtester test case.
PS. Because heap_hot_search_buffer() now always sets heapTuple->t_self
to the returned tuple version, updating *tid is redundant. And the call
in heapam_index_fetch_tuple() wouldn't need to do
"bslot->base.tupdata.t_self = *tid".
|Next Message||Peter Eisentraut||2019-08-06 09:49:13||Update to DocBook 4.5|
|Previous Message||Amit Langote||2019-08-06 09:17:09||Re: Fix a typo in add_partial_path|