Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-07-27 17:01:57
Message-ID: 9DF93B83-FE46-4EC4-A99D-77A60729FD2D@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jul 26, 2020, at 9:27 PM, Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Tue, Jul 21, 2020 at 2:32 AM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> [....]
>>>
>>> + StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
>>> + "InvalidOffsetNumber
>>> increments to FirstOffsetNumber");
>>>
>>> If you are going to rely on this property, I agree that it is good to
>>> check it. But it would be better to NOT rely on this property, and I
>>> suspect the code can be written quite cleanly without relying on it.
>>> And actually, that's what you did, because you first set ctx.offnum =
>>> InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
>>> the loop initializer. So AFAICS the first initializer, and the static
>>> assert, are pointless.
>>
>> Ah, right you are. Removed.
>>
>
> I can see the same assert and the unnecessary assignment in v12-0002, is that
> the same thing that is supposed to be removed, or am I missing something?

That's the same thing. I removed it, but obviously I somehow removed the removal prior to making the patch. My best guess is that I reverted some set of changes that unintentionally included this one.

>
>> [....]
>>> +confess(HeapCheckContext * ctx, char *msg)
>>> +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
>>> +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
>>>
>>> This is what happens when you pgindent without adding all the right
>>> things to typedefs.list first ... or when you don't pgindent and have
>>> odd ideas about how to indent things.
>>
>> Hmm. I don't see the three lines of code you are quoting. Which patch is that from?
>>
>
> I think it was the same thing related to my previous suggestion to list new
> structures to typedefs.list. V12 has listed new structures but I think there
> are still some more adjustments needed in the code e.g. see space between
> HeapCheckContext and * (asterisk) that need to be fixed. I am not sure if the
> pgindent will do that or not.

Hmm. I'm not seeing an example of HeapCheckContext with wrong spacing. Can you provide a file and line number? There was a problem with enum SkipPages. I've added that to the typedefs.list and rerun pgindent.

While looking at that, I noticed that the function and variable naming conventions in this patch were irregular, with names like TransactionIdValidInRel (init-caps) and tuple_is_visible (underscores), so I spent some time cleaning that up for v13.

> Here are a few more minor comments for the v12-0002 patch & some of them
> apply to other patches as well:
>
> #include "utils/snapmgr.h"
> -
> +#include "amcheck.h"
>
> Doesn't seem to be at the correct place -- need to be in sorted order.

Fixed.

> + if (!PG_ARGISNULL(3))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("starting block " INT64_FORMAT
> + " is out of bounds for relation with no blocks",
> + PG_GETARG_INT64(3))));
> + if (!PG_ARGISNULL(4))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("ending block " INT64_FORMAT
> + " is out of bounds for relation with no blocks",
> + PG_GETARG_INT64(4))));
>
> I think these errmsg() strings also should be in one line.

I chose not to do so, because the INT64_FORMAT bit breaks up the text even if placed all on one line. I don't feel strongly about that, though, so I'll join them for v13.

> + if (fatal)
> + {
> + if (ctx.toast_indexes)
> + toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
> + ShareUpdateExclusiveLock);
> + if (ctx.toastrel)
> + table_close(ctx.toastrel, ShareUpdateExclusiveLock);
>
> Toast index and rel closing block style is not the same as at the ending of
> verify_heapam().

I've harmonized the two. Thanks for noticing.

> + /* If we get this far, we know the relation has at least one block */
> + startblock = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3);
> + endblock = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4);
> + if (startblock < 0 || endblock >= ctx.nblocks || startblock > endblock)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT
> + " is out of bounds for relation with block count %u",
> + startblock, endblock, ctx.nblocks)));
> +
> ...
> ...
> + if (startblock < 0)
> + startblock = 0;
> + if (endblock < 0 || endblock > ctx.nblocks)
> + endblock = ctx.nblocks;
>
> Other than endblock < 0 case

This case does not need special checking, either. The combination of checking that startblock >= 0 and that startblock <= endblock already handles it.

> , do we really need that? I think due to the above
> error check the rest of the cases will not reach this place.

We don't need any of that. Removed in v13.

> + confess(ctx, psprintf(
> + "tuple xmax %u follows last assigned xid %u",
> + xmax, ctx->nextKnownValidXid));
> + fatal = true;
> + }
> + }
> +
> + /* Check for tuple header corruption */
> + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> + {
> + confess(ctx,
> + psprintf("tuple's header size is %u bytes which is less than the %u
> byte minimum valid header size",
> + ctx->tuphdr->t_hoff,
> + (unsigned) SizeofHeapTupleHeader));
>
> confess() call has two different code styles, first one where psprintf()'s only
> argument got its own line and second style where psprintf has its own line with
> the argument. I think the 2nd style is what we do follow & correct, not the
> former.

Ok, standardized in v13.

> + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("\"%s\" is not a heap",
> + RelationGetRelationName(rel))));
>
> Like elsewhere, can we have errmsg as "only heap AM is supported" and error
> code is ERRCODE_FEATURE_NOT_SUPPORTED ?

I'm indifferent about that change. Done for v13.

> That all, for now, apologize for multiple review emails.

Not at all! I appreciate all the reviews.

Attachment Content-Type Size
v13-0001-Adding-function-verify_btreeam-and-bumping-versi.patch application/octet-stream 65.1 KB
v13-0002-Adding-function-verify_heapam-to-amcheck-module.patch application/octet-stream 56.6 KB
v13-0003-Adding-contrib-module-pg_amcheck.patch application/octet-stream 55.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-07-27 17:25:29 Re: [BUG] Error in BRIN summarization
Previous Message Alexey Kondratov 2020-07-27 16:34:46 Re: [POC] Fast COPY FROM command for the table with foreign partitions