Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-17 06:05:06
Message-ID: CAE9k0Pk2TCTjGqr6yrX1tQNG66zJfdnSSveTqM2cW1R-Jcn30g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Masahiko-san,

Thanks for the review. Please check the comments inline below:

On Fri, Aug 14, 2020 at 10:07 AM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:

> Thank you for updating the patch! Here are my comments on v5 patch:
>
> --- a/contrib/Makefile
> +++ b/contrib/Makefile
> @@ -35,6 +35,7 @@ SUBDIRS = \
> pg_standby \
> pg_stat_statements \
> pg_trgm \
> + pg_surgery \
> pgcrypto \
>
> I guess we use alphabetical order here. So pg_surgery should be placed
> before pg_trgm.
>

Okay, will take care of this in the next version of patch.

> ---
> + if (heap_force_opt == HEAP_FORCE_KILL)
> + ItemIdSetDead(itemid);
>
> I think that if the page is an all-visible page, we should clear an
> all-visible bit on the visibility map corresponding to the page and
> PD_ALL_VISIBLE on the page header. Otherwise, index only scan would
> return the wrong results.
>

I think we should let VACUUM do that. Please note that this module is
intended to be used only on a damaged relation and should only be
operated on damaged tuples of such relations. And the execution of any
of the functions provided by this module on a damaged relation must be
followed by VACUUM with DISABLE_PAGE_SKIPPING option on that relation.
This is necessary to bring back a damaged relation to the sane state
once a surgery is performed on it. I will try to add this note in the
documentation for this module.

> ---
> + /*
> + * We do not mark the buffer dirty or do WAL logging for unmodifed
> + * pages.
> + */
> + if (!did_modify_page)
> + goto skip_wal;
> +
> + /* Mark buffer dirty before we write WAL. */
> + MarkBufferDirty(buf);
> +
> + /* XLOG stuff */
> + if (RelationNeedsWAL(rel))
> + log_newpage_buffer(buf, true);
> +
> +skip_wal:
> + END_CRIT_SECTION();
> +
>
> s/unmodifed/unmodified/
>

okay, will fix this typo.

> Do we really need to use goto? I think we can modify it like follows:
>
> if (did_modity_page)
> {
> /* Mark buffer dirty before we write WAL. */
> MarkBufferDirty(buf);
>
> /* XLOG stuff */
> if (RelationNeedsWAL(rel))
> log_newpage_buffer(buf, true);
> }
>
> END_CRIT_SECTION();
>

No, we don't need it. We can achieve the same by checking the status
of did_modify_page flag as you suggested. I will do this change in the
next version.

> ---
> pg_force_freeze() can revival a tuple that is already deleted but not
> vacuumed yet. Therefore, the user might need to reindex indexes after
> using that function. For instance, with the following script, the last
> two queries: index scan and seq scan, will return different results.
>
> set enable_seqscan to off;
> set enable_bitmapscan to off;
> set enable_indexonlyscan to off;
> create table tbl (a int primary key);
> insert into tbl values (1);
>
> update tbl set a = a + 100 where a = 1;
>
> explain analyze select * from tbl where a < 200;
>
> -- revive deleted tuple on heap
> select heap_force_freeze('tbl', array['(0,1)'::tid]);
>
> -- index scan returns 2 tuples
> explain analyze select * from tbl where a < 200;
>
> -- seq scan returns 1 tuple
> set enable_seqscan to on;
> explain analyze select * from tbl;
>

I am not sure if this is the right use-case of pg_force_freeze
function. I think we should only be running pg_force_freeze function
on a tuple for which VACUUM reports "found xmin ABC from before
relfrozenxid PQR" sort of error otherwise it might worsen the things
instead of making it better. Now, the question is - can VACUUM report
this type of error for a deleted tuple or it would only report it for
a live tuple? AFAIU this won't be reported for the deleted tuples
because VACUUM wouldn't consider freezing a tuple that has been
deleted.

> Also, if a tuple updated and moved to another partition is revived by
> heap_force_freeze(), its ctid still has special values:
> MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> see a problem yet caused by a visible tuple having the special ctid
> value, but it might be worth considering either to reset ctid value as
> well or to not freezing already-deleted tuple.
>

For this as well, the answer remains the same as above.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-08-17 06:07:39 Re: display offset along with block number in vacuum errors
Previous Message Drouvot, Bertrand 2020-08-17 05:49:12 Re: Display individual query in pg_stat_activity