Re: HOT chain validation in verify_heapam()

From: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2023-01-22 15:18:23
Message-ID: CAPF61jCNEZCmM51YRL+N2MiYSnE+Sj11DPjELkeO57awU8PNaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 20, 2023 at 12:38 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> I think that the handling of lp_valid[] in the loop that begins with
> "Loop over offset and populate predecessor array from all entries that
> are present in successor array" is very confusing. I think that
> lp_valid[] should be answering the question "is the line pointer
> basically sane?". That is, if it's a redirect, it needs to point to
> something within the line pointer array (and we also check that it
> must be an entry in the line pointer array that is used, which seems
> fine). If it's not a redirect, it needs to point to space that's
> entirely within the block, properly aligned, and big enough to contain
> a tuple. We determine the answers to all of these questions in the
> first loop, the one that starts with /* Perform tuple checks */.
>
> Nothing that happens in the second loop, where we populate the
> predecessor array, can reverse our previous conclusion that the line
> pointer is valid, so this loop shouldn't be resetting entries in
> lp_valid[] to false. The reason that it's doing so seems to be that it
> wants to use lp_valid[] to control the behavior of the third loop,
> where we perform checks against things that have entries in the
> predecessor array. As written, the code ensures that we always set
> lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to
> a value other than InvalidOffsetNumber. But that is needlessly
> complex: the third loop doesn't need to look at lp_valid[] at all. It
> can just check whether predecessor[currentoffnum] is valid. If it is,
> perform checks. Otherwise, skip it. It seems to me that this would be
> significantly simpler.
>
I was trying to use lp_valid as I need to identify the root of the HOT
chain and we are doing validation on the root of the HOT chain when we loop
over the predecessor array.
if (nextoffnum == InvalidOffsetNumber ||
!lp_valid[ctx.offnum] || !lp_valid[nextoffnum])
{
/*
* Set lp_valid of nextoffnum to false if
current tuple's
* lp_valid is true. We don't add this to
predecessor array as
* it's of no use to validate tuple if its
predecessor is
* already corrupted but we need to
identify all those tuple's
* so that we can differentiate between all
the cases of
* missing offset in predecessor array,
this will help in
* validating the root of chain when we
loop over predecessor
* array.
*/
if (!lp_valid[ctx.offnum] &&
lp_valid[nextoffnum])
lp_valid[nextoffnum] = false;
Was resetting lp_valid in the last patch because we don't add data to
predecessor[] and while looping over the predecessor array we need to
isolate (and identify) all cases of missing data in the predecessor array
to exactly identify the root of HOT chain.
One solution is to always add data to predecessor array while looping over
successor array and then while looping over predecessor array we can
continue for other validation "if (lp_valid [predecessor[currentoffnum]] &&
lp_valid[currentoffnum]" is true but in this case also our third loop will
also look at lp_valid[].

To put the above complaint another way, a variable shouldn't mean two
> different things depending on where you are in the function. Right
> now, at the end of the first loop, lp_valid[x] answers the question
> "is line pointer x basically valid?". But by the end of the second
> loop, it answers the question "is line pointer x valid and does it
> also have a valid predecessor?". That kind of definitional change is
> something to be avoided.
>
> agree.

> The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin))
> seems wrong to me. Shouldn't it be &&? Has this code been tested at
> all? It doesn't seem to have a test case. Some of these other errors
> don't, either. Maybe there's some that we can't easily test in an
> automated way, but we should test what we can. I guess maybe casual
> testing wouldn't reveal the problem here because of the recheck, but
> it's worrying to find logic that doesn't look right with no
> corresponding comments or test cases.
>
> This is totally my Mistake, apologies for that. I will fix this in my next
patch. Regarding the missing test cases, I need one in-progress transaction
for these test cases to be included in 004_verify_heapam.pl but I
don't find a clear way to have an in-progress transaction(as per the design
of 004_verify_heapam.pl ) that I can use in the test cases. I will be doing
more research on a solution to add these missing test cases.

> Some error message kibitizing:
>
> psprintf("redirected tuple at line pointer offset %u is not heap only
> tuple",
>
> It seems to me that this should say "redirected line pointer pointing
> to a non-heap-only tuple at offset %u". There is no such thing as a
> redirected tuple -- and even if there were, what we have here is
> clearly a redirected line pointer.
>
> psprintf("redirected tuple at line pointer offset %u is not heap only
> tuple",
>
> And I think for the same reasons this one should say something like
> "redirected line pointer pointing to a non-heap-only tuple at offset
> %u".
>
> psprintf("redirected tuple at line pointer offset %u is not heap
> updated tuple",
>
> Possibly all of these would sound better with "points" rather than
> "pointing" -- if so, we'd need to change an existing message in the
> same way.
>
> And this one should say something like "redirected line pointer
> pointing to a tuple not produced by an update at offset %u".
>
> psprintf("tuple is root of chain but it is marked as heap-only tuple"));
>
> I think this would sound better if you deleted the word "it".
>
> Will change accordingly in my next patch.

> I don't know whether it's worth arguing about -- it feels like we've
> argued too much already about this sort of thing -- but I am not very
> convinced by initializers like OffsetNumber
> predecessor[MaxOffsetNumber] = {InvalidOffsetNumber}. That style is
> only correct because InvalidOffsetNumber happens to be zero. If it
> were up to me, I'd use memset to clear the predecessor array. I would
> not bulk initialize sucessor and lp_valid but make sure that the first
> loop always sets them, possibly by having the top of the loop set them
> to InvalidOffsetNumber and false initially and then letting code later
> in the loop change the value, or possibly in some other way.
>
> agree, will fix in my next patch

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2023-01-22 15:41:32 Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
Previous Message Dean Rasheed 2023-01-22 15:06:50 Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16