Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-19 10:25:03
Message-ID: CA+fd4k66r7H15i=f=Qd0Cziuim1i8KxEf7S+5EEbnha0A0NQ4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 19 Aug 2020 at 15:09, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> On Wed, Aug 19, 2020 at 9:27 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Mon, 17 Aug 2020 at 15:05, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > >
> > > > pg_force_freeze() can revival a tuple that is already deleted but not
> > > > vacuumed yet. Therefore, the user might need to reindex indexes after
> > > > using that function. For instance, with the following script, the last
> > > > two queries: index scan and seq scan, will return different results.
> > > >
> > > > set enable_seqscan to off;
> > > > set enable_bitmapscan to off;
> > > > set enable_indexonlyscan to off;
> > > > create table tbl (a int primary key);
> > > > insert into tbl values (1);
> > > >
> > > > update tbl set a = a + 100 where a = 1;
> > > >
> > > > explain analyze select * from tbl where a < 200;
> > > >
> > > > -- revive deleted tuple on heap
> > > > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> > > >
> > > > -- index scan returns 2 tuples
> > > > explain analyze select * from tbl where a < 200;
> > > >
> > > > -- seq scan returns 1 tuple
> > > > set enable_seqscan to on;
> > > > explain analyze select * from tbl;
> > > >
> > >
> > > I am not sure if this is the right use-case of pg_force_freeze
> > > function. I think we should only be running pg_force_freeze function
> > > on a tuple for which VACUUM reports "found xmin ABC from before
> > > relfrozenxid PQR" sort of error otherwise it might worsen the things
> > > instead of making it better.
> >
> > Should this also be documented? I think that it's hard to force the
> > user to always use this module in the right situation but we need to
> > show at least when to use.
> >
>
> I've already added some examples in the documentation explaining the
> use-case of force_freeze function. If required, I will also add a note
> about it.
>
> > > > Also, if a tuple updated and moved to another partition is revived by
> > > > heap_force_freeze(), its ctid still has special values:
> > > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > > > see a problem yet caused by a visible tuple having the special ctid
> > > > value, but it might be worth considering either to reset ctid value as
> > > > well or to not freezing already-deleted tuple.
> > > >
> > >
> > > For this as well, the answer remains the same as above.
> >
> > Perhaps the same is true when a tuple header is corrupted including
> > xmin and ctid for some reason and the user wants to fix it? I'm
> > concerned that a live tuple having the wrong ctid will cause SEGV or
> > PANIC error in the future.
> >
>
> If a tuple header itself is corrupted, then I think we must kill that
> tuple. If only xmin and t_ctid fields are corrupted, then probably we
> can think of resetting the ctid value of that tuple. However, it won't
> be always possible to detect the corrupted ctid value. It's quite
> possible that the corrupted ctid field has valid values for block
> number and offset number in it, but it's actually corrupted and it
> would be difficult to consider such ctid as corrupted. Hence, we can't
> do anything about such types of corruption. Probably in such cases we
> need to run VACUUM FULL on such tables so that new ctid gets generated
> for each tuple in the table.

Understood.

Perhaps such corruption will be able to be detected by new heapam
check functions discussed on another thread. My point was that it
might be better to attempt making the tuple header sane state as much
as possible when fixing a live tuple in order to prevent further
problems such as databases crash by malicious attackers.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2020-08-19 10:28:15 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Fujii Masao 2020-08-19 09:12:02 Re: Creating a function for exposing memory usage of backend process