Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-25 12:17:05
Message-ID: CA+fd4k6yQbjF=wxLShpBNF7pstsOdqCUXhSNv-v698LLYaYrEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 25 Aug 2020 at 17:08, Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Hi Masahiko-san,
> >
> > Please find the updated patch with the following new changes:
> >
>
> Thank you for updating the patch!
>
> > 1) It adds the code changes in heap_force_kill function to clear an
> > all-visible bit on the visibility map corresponding to the page that
> > is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> > flag on the page header.
>
> I think we need to clear all visibility map bits by using
> VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
> not all-visible bit, which is not a valid state.
>
> >
> > 2) It adds the code changes in heap_force_freeze function to reset the
> > ctid value in a tuple header if it is corrupted.
> >
> > 3) It adds several notes and examples in the documentation stating
> > when and how we need to use the functions provided by this module.
> >
> > Please have a look and let me know for any other concern.
> >
>
> And here are small comments on the heap_surgery.c:
>
> + /*
> + * Get the offset numbers from the tids belonging to one particular page
> + * and process them one by one.
> + */
> + blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> + offnos);
> +
> + /* Calculate the number of offsets stored in offnos array. */
> + noffs = next_start_ptr - curr_start_ptr;
> +
> + /*
> + * Update the current start pointer so that next time when
> + * tids_same_page_fetch_offnums() is called, we can calculate the number
> + * of offsets present in the offnos array.
> + */
> + curr_start_ptr = next_start_ptr;
> +
> + /* Check whether the block number is valid. */
> + if (blkno >= nblocks)
> + {
> + ereport(NOTICE,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("skipping block %u for relation \"%s\"
> because the block number is out of range",
> + blkno, RelationGetRelationName(rel))));
> + continue;
> + }
> +
> + CHECK_FOR_INTERRUPTS();
>
> I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
> of the do loop for safety. I think it's unlikely to happen but the
> user might mistakenly specify a lot of wrong block numbers.
>
> ----
> + offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
> + noffs = curr_start_ptr = next_start_ptr = 0;
> + nblocks = RelationGetNumberOfBlocks(rel);
> +
> + do
> + {
>
> (snip)
>
> +
> + /*
> + * Get the offset numbers from the tids belonging to one particular page
> + * and process them one by one.
> + */
> + blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> + offnos);
> +
> + /* Calculate the number of offsets stored in offnos array. */
> + noffs = next_start_ptr - curr_start_ptr;
> +
>
> (snip)
>
> + /* No ereport(ERROR) from here until all the changes are logged. */
> + START_CRIT_SECTION();
> +
> + for (i = 0; i < noffs; i++)
>
> You copy all offset numbers belonging to the same page to palloc'd
> array, offnos, and iterate it while processing the tuples. I might be
> missing something but I think we can do that without allocating the
> space for offset numbers. Is there any reason for this? I guess we can
> do that by just iterating the sorted tids array.
>

Let me share other comments on the latest version patch:

Some words need to be tagged. For instance, I found the following words:

VACUUM
DISABLE_PAGE_SKIPPING
HEAP_XMIN_FROZEN
HEAP_XMAX_INVALID

---
+test=# select ctid from t1 where xmin = 507;
+ ctid
+-------
+ (0,3)
+(1 row)
+
+test=# select heap_force_freeze('t1'::regclass, ARRAY['(0, 3)']::tid[]);
+-[ RECORD 1 ]-----+-
+heap_force_freeze |

I think it's better to use a consistent output format. The former uses
the normal format whereas the latter uses the expanded format.

---
+ <note>
+ <para>
+ While performing surgery on a damaged relation, we must not be doing anything
+ else on that relation in parallel. This is to ensure that when we are
+ operating on a damaged tuple there is no other transaction trying to modify
+ that tuple.
+ </para>
+ </note>

If we prefer to avoid concurrent operations on the target relation why
don't we use AccessExclusiveLock?

---
+CREATE FUNCTION heap_force_kill(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_kill'
+LANGUAGE C STRICT;

+CREATE FUNCTION heap_force_freeze(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_freeze'
+LANGUAGE C STRICT;

I think these functions should be PARALLEL UNSAFE.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-08-25 12:28:20 Re: Autovacuum on partitioned table (autoanalyze)
Previous Message Daniel Gustafsson 2020-08-25 11:48:39 Re: passwordcheck: Log cracklib diagnostics