Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-07-14 01:10:11
Message-ID: 20200714011011.6ojuvawg7bec3byp@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-07-13 20:47:10 -0400, Robert Haas wrote:
> On Mon, Jul 13, 2020 at 6:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Not fully, I'm afraid. Afaict it doesn't currently tell you the item
> > pointer offset, just the block numer, right? We probably should extend
> > it to also include the offset...
>
> Oh, I hadn't realized that limitation. That would be good to fix.

Yea. And it'd even be good if we were to to end up implementing your
suggestion below about continuing vacuuming other tuples.

> It would be even better, I think, if we could have VACUUM proceed with
> the rest of vacuuming the table, emitting warnings about each
> instance, instead of blowing up when it hits the first bad tuple, but
> I think you may have told me sometime that doing so would be, uh, less
> than straightforward.

Yea, it's not that simple to implement. Not impossible either.

> We probably should refuse to update relfrozenxid/relminmxid when this
> is happening, but I *think* it would be better to still proceed with
> dead tuple cleanup as far as we can, or at least have an option to
> enable that behavior. I'm not positive about that, but not being able
> to complete VACUUM at all is a FAR more urgent problem than not being
> able to freeze, even though in the long run the latter is more severe.

I'm hesitant to default to removing tuples once we've figured out that
something is seriously wrong. Could easy enough make us plow ahead and
delete valuable data on other tuples, even if we'd already detected
there's a problem. But I also see the problem you raise. That's not
academic, a number of multixact corruption issues the checks detected
IIRC weren't guaranteed to be caught.

> > > 2. In some other, similar situations, e.g. where the tuple data is
> > > garbled, it's often possible to get out from under the problem by
> > > deleting the tuple at issue. But I think that doesn't necessarily fix
> > > anything in this case.
> >
> > Huh, why not? That worked in the cases I saw.
>
> I'm not sure I've seen a case where that didn't work, but I don't see
> a reason why it couldn't happen. Do you think the code is structured
> in such a way that a deleted tuple is guaranteed to be pruned even if
> the XID is old?

I think so, leaving aside some temporary situations perhaps.

> What if clog has been truncated so that the xmin can't be looked up?

That's possible, but probably only in cases where xmin actually
committed.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-07-14 01:13:04 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Robert Haas 2020-07-14 01:03:30 Re: recovering from "found xmin ... from before relfrozenxid ..."