Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-09-01 09:16:06
Message-ID: CA+fd4k67ke0uj4ioK03o7bepxHadyM64c+veop+np_GKZk3TsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 28 Aug 2020 at 23:39, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Aug 28, 2020 at 4:07 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > You've removed the description about executing VACUUM with
> > DISABLE_PAGE_SKIPPING option on the target relation after using
> > pg_surgery functions from the doc but I guess it’s better to recommend
> > that in the doc for safety. Could you please tell me the reason for
> > removing that?
>
> Well, I think that was added because there wasn't code to clear the
> visibility map bits, either page-level in the map, but we added code
> for that, so now I don't really see why it's necessary or even
> desirable.
>
> Here are a few example scenarios:
>
> 1. My table is not corrupt. For no particular reason, I force-freeze
> or force-kill a tuple which is neither dead nor all-visible.
> Concurrent queries might return wrong answers, but the table is not
> corrupt. It does not require VACUUM and would not benefit from it.
> Actually, it doesn't need anything at all.
>
> 2. My table is not corrupt. For no particular reason, I force-freeze a
> tuple which is dead. I believe it's possible that the index entries
> for that tuple might be gone already, but VACUUM won't fix that.
> REINDEX or a table rewrite would, though. It's also possible, if the
> dead tuple was added by an aborted transaction which added columns to
> the table, that the tuple might have been created using a tuple
> descriptor that differs from the table's current tuple descriptor. If
> so, I think scanning the table could produce a crash. VACUUM won't fix
> this, either. I would need to delete or force-kill the offending
> tuple.
>
> 3. I have one or more tuples in my table that are intact except that
> they have garbage values for xmin, resulting in VACUUM failure or
> possibly even SELECT failure if the CLOG entries are also missing. I
> force-kill or force-freeze them. If by chance the affected tuples were
> also omitted from one or more indexes, a REINDEX or table rewrite is
> needed to fix them, but a VACUUM will not help. On the other hand, if
> those tuples are present in the indexes, there's no remaining problem
> and VACUUM is not needed for the purpose of restoring the integrity of
> the table. If the problem has been ongoing for a while, VACUUM might
> be needed to advance relfrozenxid, but that doesn't require
> DISABLE_PAGE_SKIPPING.
>
> 4. I have some pages in my table that have incorrect visibility map
> bits. In this case, I need VACUUM (DISABLE_PAGE_SKIPPING). However, I
> don't need the functions we're talking about here at all unless I also
> have tuples with corrupted visibility information. If I do happen to
> have both tuples with corrupted visibility information and also pages
> with incorrect visibility map bits, then I suppose I need both these
> tools and also VACUUM (DISABLE_PAGE_SKIPPING). Probably, I'll want to
> do the VACUUM second. But, if I happened to do the VACUUM first and
> then use these functions afterward, the worst thing that could happen
> is that I might end up with a some dead tuples that could've gotten
> removed faster if I'd switched the order. And that's not a disaster.
>
> Basically, I can see no real reason to recommend VACUUM
> (DISABLE_PAGE_SKIPPING) here. There are problems that can be fixed
> with that command, and there are problems that can be fixed by this
> method, but they are mostly independent of each other. We should not
> recommend that people run VACUUM "just in case." That kind of fuzzy
> thinking seems relatively prevalent already, and it leads to people
> spending a lot of time running slow maintenance commands that do
> nothing to help them, and which occasionally make things worse.
>

Thank you for your explanation. That very makes sense to me.

If vacuum could fix the particular kind of problem by using together
with pg_surgery we could recommend using vacuum. But I agree that the
corruption of heap table is not the case.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-09-01 09:19:03 Re: Disk-based hash aggregate's cost model
Previous Message Laurenz Albe 2020-09-01 09:06:00 Re: Is it possible to set end-of-data marker for COPY statement.