Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-21 13:24:58
Message-ID: CAE9k0P=vu8uaQf=HsE=k2q4k2X7mVbjUtH81UGC6HX5B2UPqnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Masahiko-san,

Please find the updated patch with the following new changes:

1) It adds the code changes in heap_force_kill function to clear an
all-visible bit on the visibility map corresponding to the page that
is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
flag on the page header.

2) It adds the code changes in heap_force_freeze function to reset the
ctid value in a tuple header if it is corrupted.

3) It adds several notes and examples in the documentation stating
when and how we need to use the functions provided by this module.

Please have a look and let me know for any other concern.

Thanks,

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Aug 20, 2020 at 11:43 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> On Thu, Aug 20, 2020 at 11:04 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Wed, 19 Aug 2020 at 20:45, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:55 PM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Wed, 19 Aug 2020 at 15:09, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Aug 19, 2020 at 9:27 AM Masahiko Sawada
> > > > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > > > >
> > > > > > On Mon, 17 Aug 2020 at 15:05, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > > pg_force_freeze() can revival a tuple that is already deleted but not
> > > > > > > > vacuumed yet. Therefore, the user might need to reindex indexes after
> > > > > > > > using that function. For instance, with the following script, the last
> > > > > > > > two queries: index scan and seq scan, will return different results.
> > > > > > > >
> > > > > > > > set enable_seqscan to off;
> > > > > > > > set enable_bitmapscan to off;
> > > > > > > > set enable_indexonlyscan to off;
> > > > > > > > create table tbl (a int primary key);
> > > > > > > > insert into tbl values (1);
> > > > > > > >
> > > > > > > > update tbl set a = a + 100 where a = 1;
> > > > > > > >
> > > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > > >
> > > > > > > > -- revive deleted tuple on heap
> > > > > > > > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> > > > > > > >
> > > > > > > > -- index scan returns 2 tuples
> > > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > > >
> > > > > > > > -- seq scan returns 1 tuple
> > > > > > > > set enable_seqscan to on;
> > > > > > > > explain analyze select * from tbl;
> > > > > > > >
> > > > > > >
> > > > > > > I am not sure if this is the right use-case of pg_force_freeze
> > > > > > > function. I think we should only be running pg_force_freeze function
> > > > > > > on a tuple for which VACUUM reports "found xmin ABC from before
> > > > > > > relfrozenxid PQR" sort of error otherwise it might worsen the things
> > > > > > > instead of making it better.
> > > > > >
> > > > > > Should this also be documented? I think that it's hard to force the
> > > > > > user to always use this module in the right situation but we need to
> > > > > > show at least when to use.
> > > > > >
> > > > >
> > > > > I've already added some examples in the documentation explaining the
> > > > > use-case of force_freeze function. If required, I will also add a note
> > > > > about it.
> > > > >
> > > > > > > > Also, if a tuple updated and moved to another partition is revived by
> > > > > > > > heap_force_freeze(), its ctid still has special values:
> > > > > > > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > > > > > > > see a problem yet caused by a visible tuple having the special ctid
> > > > > > > > value, but it might be worth considering either to reset ctid value as
> > > > > > > > well or to not freezing already-deleted tuple.
> > > > > > > >
> > > > > > >
> > > > > > > For this as well, the answer remains the same as above.
> > > > > >
> > > > > > Perhaps the same is true when a tuple header is corrupted including
> > > > > > xmin and ctid for some reason and the user wants to fix it? I'm
> > > > > > concerned that a live tuple having the wrong ctid will cause SEGV or
> > > > > > PANIC error in the future.
> > > > > >
> > > > >
> > > > > If a tuple header itself is corrupted, then I think we must kill that
> > > > > tuple. If only xmin and t_ctid fields are corrupted, then probably we
> > > > > can think of resetting the ctid value of that tuple. However, it won't
> > > > > be always possible to detect the corrupted ctid value. It's quite
> > > > > possible that the corrupted ctid field has valid values for block
> > > > > number and offset number in it, but it's actually corrupted and it
> > > > > would be difficult to consider such ctid as corrupted. Hence, we can't
> > > > > do anything about such types of corruption. Probably in such cases we
> > > > > need to run VACUUM FULL on such tables so that new ctid gets generated
> > > > > for each tuple in the table.
> > > >
> > > > Understood.
> > > >
> > > > Perhaps such corruption will be able to be detected by new heapam
> > > > check functions discussed on another thread. My point was that it
> > > > might be better to attempt making the tuple header sane state as much
> > > > as possible when fixing a live tuple in order to prevent further
> > > > problems such as databases crash by malicious attackers.
> > > >
> > >
> > > Agreed. So, to handle the ctid related concern that you raised, I'm
> > > planning to do the following changes to ensure that the tuple being
> > > marked as frozen contains the correct item pointer value. Please let
> > > me know if you are okay with these changes.
> >
> > Given that a live tuple never indicates to ve moved partitions, I
> > guess the first condition in the if statement is not necessary. The
> > rest looks good to me, although other hackers might think differently.
> >
>
> Okay, thanks for confirming. I am planning to go ahead with this
> approach. Will later see what others have to say about it.
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
v7-0001-Add-contrib-pg_surgery-to-perform-surgery-on-a-damag.patch text/x-patch 28.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2020-08-21 14:27:06 Re: Creating a function for exposing memory usage of backend process
Previous Message Tatsuo Ishii 2020-08-21 12:40:50 Re: Implementing Incremental View Maintenance