Re: Errors on missing pg_subtrans/ files with 9.3

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: J Smith <dark(dot)panda+lists(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Errors on missing pg_subtrans/ files with 9.3
Date: 2013-11-25 20:10:39
Message-ID: 20131125201039.GF6597@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund escribió:

> Ok, this is helpful. Do you rather longrunning transactions? The
> transaction that does foreign key checks has an xid of 10260613, while
> the row that's getting checked has 13514992.

Thanks for the analysis.

> #4 0x0000000000635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501
> tag = {locktag_field1 = 13514992, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'}
> #5 0x0000000000482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, tuple=<value optimized out>, ctid=<value optimized out>, xid=10260613, mode=LockTupleKeyShare) at heapam.c:4847
>
> I am not sure whether that's the origin of the problem but at the very
> least it seems to me that heap_lock_updated_tuple_rec() is missing
> several important pieces:
> a) do the priorXmax==xmin dance to check we're still following the same
> ctid chain. Currently we could easily stumble across completely
> unrelated tuples if a chain element aborted and got vacuumed.

This seems simple to handle by adding the check you propose to the loop.
Basically if the xmax doesn't match the xmin, we reached the end,
there's nothing more to lock and we can return success without any
further work:

/*
* Check the tuple XMIN against prior XMAX, if any. If we reached
* the end of the chain, we're done, so return success.
*/
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(&mytup),
priorXmax))
{
UnlockReleaseBuffer(buf);
return HeapTupleMayBeUpdated;
}

> b) Check whether a chain element actually aborted - currently we're
> only doing that in the HEAP_KEYS_UPDATED updated case, but that seems
> wrong (we can't check for committed tho!).

Let me point out that this is exactly the same code that would be
affected by my proposed fix for #8434, which would have this check the
updateXid in all cases, not only in KEYS_UPDATED as currently.

I don't understand your comment about "can't check for committed". I
mean, if the updating transaction committed, then we need to return
failure to caller, HeapTupleUpdated. This signals to the caller that
the future version of the tuple is locked and the whole thing needs to
be failed. (In READ COMMITTED isolation level, this would cause
EvalPlanQual to fetch the updated version of the tuple and redo the
whole thing from there. In REPEATABLE READ and above, the whole
operation would be aborted.)

In short I propose to fix part this with the simple patch I proposed for
bug #8434.

> c) (reported separately as well) cope with failure of heap_fetch() to
> return anything.

Proposed patch for this was posted in the other thread.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message AK 2013-11-25 20:37:49 Re: Why is UPDATE with column-list syntax not implemented
Previous Message Kevin Grittner 2013-11-25 20:10:22 Re: [PATCH] SQL assertions prototype