Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-08-25 12:38:38
Message-ID: CAE9k0P=QAmVBs=D+SdtNn=G_1_-f_MC9sAXK=SYsSy2BPedhcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Masahiko-san,

Thank you for the review. Please check my comments inline below:

On Tue, Aug 25, 2020 at 1:39 PM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Fri, 21 Aug 2020 at 22:25, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Hi Masahiko-san,
> >
> > Please find the updated patch with the following new changes:
> >
>
> Thank you for updating the patch!
>
> > 1) It adds the code changes in heap_force_kill function to clear an
> > all-visible bit on the visibility map corresponding to the page that
> > is marked all-visible. Along the way it also clears PD_ALL_VISIBLE
> > flag on the page header.
>
> I think we need to clear all visibility map bits by using
> VISIBILITYMAP_VALID_BITS. Otherwise, the page has all-frozen bit but
> not all-visible bit, which is not a valid state.
>

Yeah, makes sense, I will do that change in the next version of patch.

> >
> > 2) It adds the code changes in heap_force_freeze function to reset the
> > ctid value in a tuple header if it is corrupted.
> >
> > 3) It adds several notes and examples in the documentation stating
> > when and how we need to use the functions provided by this module.
> >
> > Please have a look and let me know for any other concern.
> >
>
> And here are small comments on the heap_surgery.c:
>
> + /*
> + * Get the offset numbers from the tids belonging to one particular page
> + * and process them one by one.
> + */
> + blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> + offnos);
> +
> + /* Calculate the number of offsets stored in offnos array. */
> + noffs = next_start_ptr - curr_start_ptr;
> +
> + /*
> + * Update the current start pointer so that next time when
> + * tids_same_page_fetch_offnums() is called, we can calculate the number
> + * of offsets present in the offnos array.
> + */
> + curr_start_ptr = next_start_ptr;
> +
> + /* Check whether the block number is valid. */
> + if (blkno >= nblocks)
> + {
> + ereport(NOTICE,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("skipping block %u for relation \"%s\"
> because the block number is out of range",
> + blkno, RelationGetRelationName(rel))));
> + continue;
> + }
> +
> + CHECK_FOR_INTERRUPTS();
>
> I guess it would be better to call CHECK_FOR_INTERRUPTS() at the top
> of the do loop for safety. I think it's unlikely to happen but the
> user might mistakenly specify a lot of wrong block numbers.
>

Okay, np, will shift it to top of the do loop.

> ----
> + offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
> + noffs = curr_start_ptr = next_start_ptr = 0;
> + nblocks = RelationGetNumberOfBlocks(rel);
> +
> + do
> + {
>
> (snip)
>
> +
> + /*
> + * Get the offset numbers from the tids belonging to one particular page
> + * and process them one by one.
> + */
> + blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
> + offnos);
> +
> + /* Calculate the number of offsets stored in offnos array. */
> + noffs = next_start_ptr - curr_start_ptr;
> +
>
> (snip)
>
> + /* No ereport(ERROR) from here until all the changes are logged. */
> + START_CRIT_SECTION();
> +
> + for (i = 0; i < noffs; i++)
>
> You copy all offset numbers belonging to the same page to palloc'd
> array, offnos, and iterate it while processing the tuples. I might be
> missing something but I think we can do that without allocating the
> space for offset numbers. Is there any reason for this? I guess we can
> do that by just iterating the sorted tids array.
>

Hmmm.. okay, I see your point. I think probably what you are trying to
suggest here is to make use of the current and next start pointers to
get the tids belonging to the same page and process them one by one
instead of fetching the offset numbers of all tids belonging to one
page into the offnos array and then iterate through the offnos array.
I think that is probably possible and I will try to do that in the
next version of patch. If there is something else that you have in
your mind, please let me know.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-08-25 12:51:37 Fix a couple of misuages of bms_num_members()
Previous Message Daniel Gustafsson 2020-08-25 12:28:20 Re: Autovacuum on partitioned table (autoanalyze)