Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-06-22 00:14:39
Message-ID: 5CC85F28-F210-47AC-8DA8-EC4E82AF3FD5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jun 21, 2020, at 2:54 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> I have looked into 0001 patch and I have a few comments.
>
> 1.
> +
> + /* Skip over unused/dead/redirected line pointers */
> + if (!ItemIdIsUsed(ctx.itemid) ||
> + ItemIdIsDead(ctx.itemid) ||
> + ItemIdIsRedirected(ctx.itemid))
> + continue;
>
> Isn't it a good idea to verify the Redirected Itemtid? Because we
> will still access the redirected item id to find the
> actual tuple from the index scan. Maybe not exactly at this level,
> but we can verify that the link itemid store in that
> is within the itemid range of the page or not.

Good idea. I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.

> 2.
>
> + /* Check for tuple header corruption */
> + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> + {
> + confess(ctx,
> + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
> + ctx->tuphdr->t_hoff,
> + (unsigned) SizeofHeapTupleHeader));
> + fatal = true;
> + }
>
> I think we can also check that if there is no NULL attributes (if
> (!(t_infomask & HEAP_HASNULL)) then
> ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.

You have to take alignment padding into account, but otherwise yes, and I've added a check for that.

> 3.
> + ctx->offset = 0;
> + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> + {
> + if (!check_tuple_attribute(ctx))
> + break;
> + }
> + ctx->offset = -1;
> + ctx->attnum = -1;
>
> So we are first setting ctx->offset to 0, then inside
> check_tuple_attribute, we will keep updating the offset as we process
> the attributes and after the loop is over we set ctx->offset to -1, I
> did not understand that why we need to reset it to -1, do we ever
> check for that. We don't even initialize the ctx->offset to -1 while
> initializing the context for the tuple so I do not understand what is
> the meaning of the random value -1.

Ahh, right, those are left over from a previous design of the code. Thanks for pointing them out. They are now removed.

> 4.
> + if (!VARATT_IS_EXTENDED(chunk))
> + {
> + chunksize = VARSIZE(chunk) - VARHDRSZ;
> + chunkdata = VARDATA(chunk);
> + }
> + else if (VARATT_IS_SHORT(chunk))
> + {
> + /*
> + * could happen due to heap_form_tuple doing its thing
> + */
> + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
> + chunkdata = VARDATA_SHORT(chunk);
> + }
> + else
> + {
> + /* should never happen */
> + confess(ctx,
> + pstrdup("toast chunk is neither short nor extended"));
> + return;
> + }
>
> I think the error message "toast chunk is neither short nor extended".
> Because ideally, the toast chunk should not be further toasted.
> So I think the check is correct, but the error message is not correct.

I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".

> 5.
>
> + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
> + check_relation_relkind_and_relam(ctx.rel);
> +
> + /*
> + * Open the toast relation, if any, also protected from concurrent
> + * vacuums.
> + */
> + if (ctx.rel->rd_rel->reltoastrelid)
> + {
> + int offset;
> +
> + /* Main relation has associated toast relation */
> + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
> + ShareUpdateExclusiveLock);
> + offset = toast_open_indexes(ctx.toastrel,
> ....
> + if (TransactionIdIsNormal(ctx.relfrozenxid) &&
> + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
> + {
> + confess(&ctx, psprintf("relfrozenxid %u precedes global "
> + "oldest valid xid %u ",
> + ctx.relfrozenxid, ctx.oldestValidXid));
> + PG_RETURN_NULL();
> + }
>
> Don't we need to close the relation/toastrel/toastindexrel in such
> return which is without an abort? IIRC, we
> will get relcache leak WARNING on commit if we left them open in commit path.

Ok, I've added logic to close them.

All changes inspired by your review are included in the v9-0001 patch. The differences since v8 are pulled out into v9_diffs for easier review.

Attachment Content-Type Size
v9-0001-Adding-verify_heapam-and-pg_amcheck.patch application/octet-stream 159.6 KB
v9_diffs application/octet-stream 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-22 00:20:25 Get rid of runtime handling of AlternativeSubPlan?
Previous Message David Rowley 2020-06-21 23:18:24 Re: suggest to rename enable_incrementalsort