Re: Assert() failures during RI checks

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Assert() failures during RI checks
Date: 2020-03-23 17:54:31
Message-ID: 20018.1584986071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2020-03-22 18:30:04 -0400, Tom Lane wrote:
>> Yeah, I was wondering about giving that a new result code, too.
>> It would be a little bit invasive and not at all back-patchable,
>> but (say) TM_SerializationViolation seems like a cleaner output ---
>> and we could define it to not return any TID info, as TM_Delete
>> doesn't. But we also need a fix that *can* be back-patched.

> I wonder if returning TM_Invisible in the backbranches would be the
> right answer. "The affected tuple wasn't visible to the relevant
> snapshot" is not the worst description for a crosscheck snapshot
> violation.

I'm not for that. Abusing an existing result code is what got us
into this mess in the first place; abusing a different result code
than what we did in prior minor releases can only make things worse
not better. (Of course, if there's no external callers of these
functions then we could get away with changing it ... but I'm not
prepared to assume that, are you?)

Also it'd mean something quite different in the direct output of
HeapTupleSatisfiesUpdate than what it would mean one level up,
which is certain to cause confusion.

I think the right thing in the back branches is to keep on returning
the "Updated" result code, but adjust the crosscheck code path so that
the TID that's sent back is always the tuple's own TID.

>> The stanza right after the crosscheck one has always assumed that
>> it is dealing with an outdated-by-update tuple, so that chasing up
>> to the next TID is a sane thing to do.

> Do you just mean the
> tmfd->ctid = tp.t_data->t_ctid;
> and
> tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
> ? I would not have described that as chasing, which would explain my
> difficulty following.

The bit about returning t_ctid rather than the tuple's own TID seems
like chasing a next-tid link to me. The Assert about xmax being valid
is certainly intended to verify that that's what is happening. I can't
speak to what the code thinks it's returning in tmfd->xmax, because
that functionality wasn't there the last time I studied this.

> I am wondering if the correct HEAD answer would be to somehow lift the
> crosscheck logic out of heapam.c. ISTM that heapam.c is kind of too low
> level for that type of concern. But it seems hard to do that without
> performing unnecessary updates that immediately are thrown away by
> throwing an error at a higher level.

Yeah, I'd be the first to agree that the crosscheck stuff is messy.
But it's hard to see how to do better.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-03-23 18:21:43 Re: RFC: Add 'taint' field to pg_control.
Previous Message Ashwin Agrawal 2020-03-23 17:43:09 Re: Make mesage at end-of-recovery less scary.