Re: Assert() failures during RI checks

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Assert() failures during RI checks
Date: 2020-03-23 17:24:55
Message-ID: 20200323172455.kqujuvpuvru7zrnn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-22 18:30:04 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I wonder if we shouldn't just change the crosscheck case to set
> > something other than TM_Updated, as it's not really accurate to say the
> > tuple was updated.
>
> Yeah, I was wondering about giving that a new result code, too.
> It would be a little bit invasive and not at all back-patchable,
> but (say) TM_SerializationViolation seems like a cleaner output ---
> and we could define it to not return any TID info, as TM_Delete
> doesn't. But we also need a fix that *can* be back-patched.

I wonder if returning TM_Invisible in the backbranches would be the
right answer. "The affected tuple wasn't visible to the relevant
snapshot" is not the worst description for a crosscheck snapshot
violation.

We would have to start issuing a better error message for it, as it's
unexpected in most places right now. It'd be kind of odd for
heap_delete/update() to error out with "attempted to delete invisible
tuple" but still return it in some cases, though.

> >> 2. Does the following stanza actually need to allow for the case
> >> of a deleted tuple as well as an updated one? If so, blindly
> >> trying to follow the ctid link is not so cool.
>
> > Which ctid link following are you referring to? I'm not sure I quite
> > follow.
>
> The stanza right after the crosscheck one has always assumed that
> it is dealing with an outdated-by-update tuple, so that chasing up
> to the next TID is a sane thing to do.

Do you just mean the
tmfd->ctid = tp.t_data->t_ctid;
and
tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
? I would not have described that as chasing, which would explain my
difficulty following.

> Maybe that's been wrong
> since day one, or maybe we made it wrong somewhere along the line,
> but it seems clearly wrong now (even without accounting for the
> serialization crosscheck case). I think it just accidentally
> doesn't cause problems in non-assert builds, as long as nobody
> is assuming too much about which TID or XID gets returned.

It'd probably be worthwhile to rejigger those branches to something
roughly like:

if (result != TM_Ok)
{
switch (result)
{
case TM_SelfModified:
Assert(!ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid));
tmfd->ctid = tp.t_data->t_ctid;
tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
tmfd->cmax = HeapTupleHeaderGetCmax(tp.t_data);
break;
case TM_Updated:
Assert(!ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid));
tmfd->ctid = tp.t_data->t_ctid;
tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
tmfd->cmax = InvalidCommandId;
break;
case TM_Deleted:
tmfd->ctid = tp.t_self;
tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
tmfd->cmax = InvalidCommandId;
break;
case TM_BeingModified:
Assert(!ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid));
tmfd->ctid = tp.t_data->t_ctid;
tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
tmfd->cmax = InvalidCommandId;
break;
default:
elog(ERROR, "unexpected visibility %d", result);
}

UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
UnlockTupleTuplock(relation, &(tp.t_self), LockTupleExclusive);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
return result;
}

> > But for the crosscheck case (most of?)
> > those wouldn't ever reach the ctid equivalency check, because of
> > + if (IsolationUsesXactSnapshot())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> > + errmsg("could not serialize access due to concurrent update")));
> > like checks beforehand.
>
> Yeah. For HEAD I'm imagining that callers would just throw
> ERRCODE_T_R_SERIALIZATION_FAILURE unconditionally if they
> get back TM_SerializationViolation.

I am wondering if the correct HEAD answer would be to somehow lift the
crosscheck logic out of heapam.c. ISTM that heapam.c is kind of too low
level for that type of concern. But it seems hard to do that without
performing unnecessary updates that immediately are thrown away by
throwing an error at a higher level.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2020-03-23 17:28:52 Re: SQL/JSON: functions
Previous Message James Coleman 2020-03-23 17:16:44 Re: [PATCH] Incremental sort (was: PoC: Partial sort)