Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-18 11:21:38
Message-ID: CAE9k0PmeC0kM-FoA4P5XFWcgwtQToFk07RaTCYeiYZLdtYrXHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is the new version of patch that addresses the comments from
Asim Praveen and Masahiko-san. It also improves the documentation to
some extent.

On Tue, Aug 18, 2020 at 1:46 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hello Masahiko-san,
>
> I've spent some more time trying to understand the code in
> lazy_scan_heap function to know under what all circumstances a VACUUM
> can fail with "found xmin ... before relfrozenxid ..." error for a
> tuple whose xmin is behind relfrozenxid. Here are my observations:
>
> 1) It can fail with this error for a live tuple
>
> OR,
>
> 2) It can also fail with this error if a tuple (that went through
> update) is marked as HEAP_HOT_UPDATED or HEAP_ONLY_TUPLE.
>
> OR,
>
> 3) If there are any concurrent transactions, then the tuple might be
> marked as HEAPTUPLE_INSERT_IN_PROGRESS or HEAPTUPLE_DELETE_IN_PROGRESS
> or HEAPTUPLE_RECENTLY_DEAD in which case also VACUUM can fail with
> this error.
>
> Now, AFAIU, as we will be dealing with a damaged table, the chances of
> point #3 being the cause of this error looks impossible in our case
> because I don't think we will be doing anything in parallel when
> performing surgery on a damaged table, in fact we shouldn't be doing
> any such things. However, it is quite possible that reason #2 could
> cause VACUUM to fail with this sort of error, but, as we are already
> skipping redirected item pointers in heap_force_common(), I think, we
> would never be marking HEAP_HOT_UPDATED tuple as frozen and I don't
> see any problem in marking HEAP_ONLY_TUPLE as frozen. So, probably, we
> may not need to handle point #2 as well.
>
> Further, I also don't see VACUUM reporting this error for a tuple that
> has been moved from one partition to another. So, I think we might not
> need to do any special handling for a tuple that got updated and its
> new version was moved to another partition.
>
> If you feel I am missing something here, please correct me. Thank you.
>
> Moreover, while I was exploring on above, I noticed that in
> lazy_scan_heap(), before we call HeapTupleSatisfiesVacuum() we check
> for a redirected item pointers and if any redirected item pointer is
> detected we do not call HeapTupleSatisfiesVacuum(). So, not sure how
> HeapTupleSatisfiesVacuum would ever return a dead tuple that is marked
> with HEAP_HOT_UPDATED. I am referring to the following code in
> lazy_scan_heap().
>
> for (offnum = FirstOffsetNumber;
> offnum <= maxoff;
> offnum = OffsetNumberNext(offnum))
> {
> ItemId itemid;
>
> itemid = PageGetItemId(page, offnum);
>
> .............
> .............
>
>
> /* Redirect items mustn't be touched */ <-- this check
> would bypass the redirected item pointers from being checked for
> HeapTupleSatisfiesVacuum.
> if (ItemIdIsRedirected(itemid))
> {
> hastup = true; /* this page won't be truncatable */
> continue;
> }
>
> ..............
> ..............
>
> switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
> {
> case HEAPTUPLE_DEAD:
>
> if (HeapTupleIsHotUpdated(&tuple) ||
> HeapTupleIsHeapOnly(&tuple) ||
> params->index_cleanup == VACOPT_TERNARY_DISABLED)
> nkeep += 1;
> else
> tupgone = true; /* we can delete the tuple */
> ..............
> ..............
> }
>
>
> So, the point is, would HeapTupleIsHotUpdated(&tuple) ever be true?
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>
> On Mon, Aug 17, 2020 at 11:35 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Hello Masahiko-san,
> >
> > Thanks for the review. Please check the comments inline below:
> >
> > On Fri, Aug 14, 2020 at 10:07 AM Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > > Thank you for updating the patch! Here are my comments on v5 patch:
> > >
> > > --- a/contrib/Makefile
> > > +++ b/contrib/Makefile
> > > @@ -35,6 +35,7 @@ SUBDIRS = \
> > > pg_standby \
> > > pg_stat_statements \
> > > pg_trgm \
> > > + pg_surgery \
> > > pgcrypto \
> > >
> > > I guess we use alphabetical order here. So pg_surgery should be placed
> > > before pg_trgm.
> > >
> >
> > Okay, will take care of this in the next version of patch.
> >
> > > ---
> > > + if (heap_force_opt == HEAP_FORCE_KILL)
> > > + ItemIdSetDead(itemid);
> > >
> > > I think that if the page is an all-visible page, we should clear an
> > > all-visible bit on the visibility map corresponding to the page and
> > > PD_ALL_VISIBLE on the page header. Otherwise, index only scan would
> > > return the wrong results.
> > >
> >
> > I think we should let VACUUM do that. Please note that this module is
> > intended to be used only on a damaged relation and should only be
> > operated on damaged tuples of such relations. And the execution of any
> > of the functions provided by this module on a damaged relation must be
> > followed by VACUUM with DISABLE_PAGE_SKIPPING option on that relation.
> > This is necessary to bring back a damaged relation to the sane state
> > once a surgery is performed on it. I will try to add this note in the
> > documentation for this module.
> >
> > > ---
> > > + /*
> > > + * We do not mark the buffer dirty or do WAL logging for unmodifed
> > > + * pages.
> > > + */
> > > + if (!did_modify_page)
> > > + goto skip_wal;
> > > +
> > > + /* Mark buffer dirty before we write WAL. */
> > > + MarkBufferDirty(buf);
> > > +
> > > + /* XLOG stuff */
> > > + if (RelationNeedsWAL(rel))
> > > + log_newpage_buffer(buf, true);
> > > +
> > > +skip_wal:
> > > + END_CRIT_SECTION();
> > > +
> > >
> > > s/unmodifed/unmodified/
> > >
> >
> > okay, will fix this typo.
> >
> > > Do we really need to use goto? I think we can modify it like follows:
> > >
> > > if (did_modity_page)
> > > {
> > > /* Mark buffer dirty before we write WAL. */
> > > MarkBufferDirty(buf);
> > >
> > > /* XLOG stuff */
> > > if (RelationNeedsWAL(rel))
> > > log_newpage_buffer(buf, true);
> > > }
> > >
> > > END_CRIT_SECTION();
> > >
> >
> > No, we don't need it. We can achieve the same by checking the status
> > of did_modify_page flag as you suggested. I will do this change in the
> > next version.
> >
> > > ---
> > > pg_force_freeze() can revival a tuple that is already deleted but not
> > > vacuumed yet. Therefore, the user might need to reindex indexes after
> > > using that function. For instance, with the following script, the last
> > > two queries: index scan and seq scan, will return different results.
> > >
> > > set enable_seqscan to off;
> > > set enable_bitmapscan to off;
> > > set enable_indexonlyscan to off;
> > > create table tbl (a int primary key);
> > > insert into tbl values (1);
> > >
> > > update tbl set a = a + 100 where a = 1;
> > >
> > > explain analyze select * from tbl where a < 200;
> > >
> > > -- revive deleted tuple on heap
> > > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> > >
> > > -- index scan returns 2 tuples
> > > explain analyze select * from tbl where a < 200;
> > >
> > > -- seq scan returns 1 tuple
> > > set enable_seqscan to on;
> > > explain analyze select * from tbl;
> > >
> >
> > I am not sure if this is the right use-case of pg_force_freeze
> > function. I think we should only be running pg_force_freeze function
> > on a tuple for which VACUUM reports "found xmin ABC from before
> > relfrozenxid PQR" sort of error otherwise it might worsen the things
> > instead of making it better. Now, the question is - can VACUUM report
> > this type of error for a deleted tuple or it would only report it for
> > a live tuple? AFAIU this won't be reported for the deleted tuples
> > because VACUUM wouldn't consider freezing a tuple that has been
> > deleted.
> >
> > > Also, if a tuple updated and moved to another partition is revived by
> > > heap_force_freeze(), its ctid still has special values:
> > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > > see a problem yet caused by a visible tuple having the special ctid
> > > value, but it might be worth considering either to reset ctid value as
> > > well or to not freezing already-deleted tuple.
> > >
> >
> > For this as well, the answer remains the same as above.
> >
> > --
> > With Regards,
> > Ashutosh Sharma
> > EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
v6-0001-Add-contrib-pg_surgery-to-perform-surgery-on-a-damag.patch text/x-patch 26.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Travers 2020-08-18 11:29:05 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Anna Akenteva 2020-08-18 10:12:51 Re: [HACKERS] make async slave to wait for lsn to be replayed