Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-20 06:13:49
Message-ID: CAE9k0P=wxSw7+LRSCGQi_fbi6RYvZghE8t0_VNq6eh52uRjm-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 20, 2020 at 11:04 AM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Wed, 19 Aug 2020 at 20:45, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 19, 2020 at 3:55 PM Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > >
> > > On Wed, 19 Aug 2020 at 15:09, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 9:27 AM Masahiko Sawada
> > > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > > >
> > > > > On Mon, 17 Aug 2020 at 15:05, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > > pg_force_freeze() can revival a tuple that is already deleted but not
> > > > > > > vacuumed yet. Therefore, the user might need to reindex indexes after
> > > > > > > using that function. For instance, with the following script, the last
> > > > > > > two queries: index scan and seq scan, will return different results.
> > > > > > >
> > > > > > > set enable_seqscan to off;
> > > > > > > set enable_bitmapscan to off;
> > > > > > > set enable_indexonlyscan to off;
> > > > > > > create table tbl (a int primary key);
> > > > > > > insert into tbl values (1);
> > > > > > >
> > > > > > > update tbl set a = a + 100 where a = 1;
> > > > > > >
> > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > >
> > > > > > > -- revive deleted tuple on heap
> > > > > > > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> > > > > > >
> > > > > > > -- index scan returns 2 tuples
> > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > >
> > > > > > > -- seq scan returns 1 tuple
> > > > > > > set enable_seqscan to on;
> > > > > > > explain analyze select * from tbl;
> > > > > > >
> > > > > >
> > > > > > I am not sure if this is the right use-case of pg_force_freeze
> > > > > > function. I think we should only be running pg_force_freeze function
> > > > > > on a tuple for which VACUUM reports "found xmin ABC from before
> > > > > > relfrozenxid PQR" sort of error otherwise it might worsen the things
> > > > > > instead of making it better.
> > > > >
> > > > > Should this also be documented? I think that it's hard to force the
> > > > > user to always use this module in the right situation but we need to
> > > > > show at least when to use.
> > > > >
> > > >
> > > > I've already added some examples in the documentation explaining the
> > > > use-case of force_freeze function. If required, I will also add a note
> > > > about it.
> > > >
> > > > > > > Also, if a tuple updated and moved to another partition is revived by
> > > > > > > heap_force_freeze(), its ctid still has special values:
> > > > > > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > > > > > > see a problem yet caused by a visible tuple having the special ctid
> > > > > > > value, but it might be worth considering either to reset ctid value as
> > > > > > > well or to not freezing already-deleted tuple.
> > > > > > >
> > > > > >
> > > > > > For this as well, the answer remains the same as above.
> > > > >
> > > > > Perhaps the same is true when a tuple header is corrupted including
> > > > > xmin and ctid for some reason and the user wants to fix it? I'm
> > > > > concerned that a live tuple having the wrong ctid will cause SEGV or
> > > > > PANIC error in the future.
> > > > >
> > > >
> > > > If a tuple header itself is corrupted, then I think we must kill that
> > > > tuple. If only xmin and t_ctid fields are corrupted, then probably we
> > > > can think of resetting the ctid value of that tuple. However, it won't
> > > > be always possible to detect the corrupted ctid value. It's quite
> > > > possible that the corrupted ctid field has valid values for block
> > > > number and offset number in it, but it's actually corrupted and it
> > > > would be difficult to consider such ctid as corrupted. Hence, we can't
> > > > do anything about such types of corruption. Probably in such cases we
> > > > need to run VACUUM FULL on such tables so that new ctid gets generated
> > > > for each tuple in the table.
> > >
> > > Understood.
> > >
> > > Perhaps such corruption will be able to be detected by new heapam
> > > check functions discussed on another thread. My point was that it
> > > might be better to attempt making the tuple header sane state as much
> > > as possible when fixing a live tuple in order to prevent further
> > > problems such as databases crash by malicious attackers.
> > >
> >
> > Agreed. So, to handle the ctid related concern that you raised, I'm
> > planning to do the following changes to ensure that the tuple being
> > marked as frozen contains the correct item pointer value. Please let
> > me know if you are okay with these changes.
>
> Given that a live tuple never indicates to ve moved partitions, I
> guess the first condition in the if statement is not necessary. The
> rest looks good to me, although other hackers might think differently.
>

Okay, thanks for confirming. I am planning to go ahead with this
approach. Will later see what others have to say about it.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-08-20 06:24:57 Re: Implement UNLOGGED clause for COPY FROM
Previous Message Michael Paquier 2020-08-20 06:11:19 Re: remove spurious CREATE INDEX CONCURRENTLY wait