Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-19 03:56:45
Message-ID: CA+fd4k6HKYiE7ng+VbnAbEWKp6A-Gth+UoXjW5LvkJPem0kC=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 17 Aug 2020 at 15:05, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hello Masahiko-san,
>
> Thanks for the review. Please check the comments inline below:
>
> On Fri, Aug 14, 2020 at 10:07 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> > Thank you for updating the patch! Here are my comments on v5 patch:
> >
> > --- a/contrib/Makefile
> > +++ b/contrib/Makefile
> > @@ -35,6 +35,7 @@ SUBDIRS = \
> > pg_standby \
> > pg_stat_statements \
> > pg_trgm \
> > + pg_surgery \
> > pgcrypto \
> >
> > I guess we use alphabetical order here. So pg_surgery should be placed
> > before pg_trgm.
> >
>
> Okay, will take care of this in the next version of patch.
>
> > ---
> > + if (heap_force_opt == HEAP_FORCE_KILL)
> > + ItemIdSetDead(itemid);
> >
> > I think that if the page is an all-visible page, we should clear an
> > all-visible bit on the visibility map corresponding to the page and
> > PD_ALL_VISIBLE on the page header. Otherwise, index only scan would
> > return the wrong results.
> >
>
> I think we should let VACUUM do that. Please note that this module is
> intended to be used only on a damaged relation and should only be
> operated on damaged tuples of such relations. And the execution of any
> of the functions provided by this module on a damaged relation must be
> followed by VACUUM with DISABLE_PAGE_SKIPPING option on that relation.
> This is necessary to bring back a damaged relation to the sane state
> once a surgery is performed on it. I will try to add this note in the
> documentation for this module.
>
> > ---
> > + /*
> > + * We do not mark the buffer dirty or do WAL logging for unmodifed
> > + * pages.
> > + */
> > + if (!did_modify_page)
> > + goto skip_wal;
> > +
> > + /* Mark buffer dirty before we write WAL. */
> > + MarkBufferDirty(buf);
> > +
> > + /* XLOG stuff */
> > + if (RelationNeedsWAL(rel))
> > + log_newpage_buffer(buf, true);
> > +
> > +skip_wal:
> > + END_CRIT_SECTION();
> > +
> >
> > s/unmodifed/unmodified/
> >
>
> okay, will fix this typo.
>
> > Do we really need to use goto? I think we can modify it like follows:
> >
> > if (did_modity_page)
> > {
> > /* Mark buffer dirty before we write WAL. */
> > MarkBufferDirty(buf);
> >
> > /* XLOG stuff */
> > if (RelationNeedsWAL(rel))
> > log_newpage_buffer(buf, true);
> > }
> >
> > END_CRIT_SECTION();
> >
>
> No, we don't need it. We can achieve the same by checking the status
> of did_modify_page flag as you suggested. I will do this change in the
> next version.
>
> > ---
> > pg_force_freeze() can revival a tuple that is already deleted but not
> > vacuumed yet. Therefore, the user might need to reindex indexes after
> > using that function. For instance, with the following script, the last
> > two queries: index scan and seq scan, will return different results.
> >
> > set enable_seqscan to off;
> > set enable_bitmapscan to off;
> > set enable_indexonlyscan to off;
> > create table tbl (a int primary key);
> > insert into tbl values (1);
> >
> > update tbl set a = a + 100 where a = 1;
> >
> > explain analyze select * from tbl where a < 200;
> >
> > -- revive deleted tuple on heap
> > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> >
> > -- index scan returns 2 tuples
> > explain analyze select * from tbl where a < 200;
> >
> > -- seq scan returns 1 tuple
> > set enable_seqscan to on;
> > explain analyze select * from tbl;
> >
>
> I am not sure if this is the right use-case of pg_force_freeze
> function. I think we should only be running pg_force_freeze function
> on a tuple for which VACUUM reports "found xmin ABC from before
> relfrozenxid PQR" sort of error otherwise it might worsen the things
> instead of making it better.

Should this also be documented? I think that it's hard to force the
user to always use this module in the right situation but we need to
show at least when to use.

> > Also, if a tuple updated and moved to another partition is revived by
> > heap_force_freeze(), its ctid still has special values:
> > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > see a problem yet caused by a visible tuple having the special ctid
> > value, but it might be worth considering either to reset ctid value as
> > well or to not freezing already-deleted tuple.
> >
>
> For this as well, the answer remains the same as above.

Perhaps the same is true when a tuple header is corrupted including
xmin and ctid for some reason and the user wants to fix it? I'm
concerned that a live tuple having the wrong ctid will cause SEGV or
PANIC error in the future.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-08-19 04:17:28 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message David Rowley 2020-08-19 03:48:31 Re: Hybrid Hash/Nested Loop joins and caching results from subplans