Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-07-06 18:06:16
Message-ID: BA6E870F-241C-4DDB-B92D-DB39E679C7D4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jul 4, 2020, at 6:04 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> A few more comments.

Your comments all pertain to function check_tuple_attribute(), which follows the logic of heap_deform_tuple() and detoast_external_attr(). The idea is that any error that could result in an assertion or crash in those functions should be checked carefully by check_tuple_attribute(), and checked *before* any such asserts or crashes might be triggered.

I obviously did not explain this thinking in the function comment. That is rectified in the v10 patch, attached.

> 1.
>
> + if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> + {
> + confess(ctx,
> + pstrdup("attribute is external but not marked as on disk"));
> + return true;
> + }
> +
> ....
> +
> + /*
> + * Must dereference indirect toast pointers before we can check them
> + */
> + if (VARATT_IS_EXTERNAL_INDIRECT(attr))
> + {
>
>
> So first we are checking that if the varatt is not
> VARATT_IS_EXTERNAL_ONDISK then we are returning, but just a
> few statements down we are checking if the varatt is
> VARATT_IS_EXTERNAL_INDIRECT, so seems like unreachable code.

True. I've removed the VARATT_IS_EXTERNAL_INDIRECT check.

> 2. Another point related to the same code is that toast_save_datum
> always set the VARTAG_ONDISK tag. IIUC, we use
> VARTAG_INDIRECT in reorderbuffer for generating temp tuple so ideally
> while scanning the heap we should never get
> VARATT_IS_EXTERNAL_INDIRECT tuple. Am I missing something here?

I think you are right that we cannot get a VARATT_IS_EXTERNAL_INDIRECT tuple. That check is removed in v10.

> 3.
> + if (VARATT_IS_1B_E(tp + ctx->offset))
> + {
> + uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
> +
> + if (va_tag != VARTAG_ONDISK)
> + {
> + confess(ctx, psprintf("unexpected TOAST vartag %u for "
> + "attribute #%u at t_hoff = %u, "
> + "offset = %u",
> + va_tag, ctx->attnum,
> + ctx->tuphdr->t_hoff, ctx->offset));
> + return false; /* We can't know where the next attribute
> + * begins */
> + }
> + }
>
> + /* Skip values that are not external */
> + if (!VARATT_IS_EXTERNAL(attr))
> + return true;
> +
> + /* It is external, and we're looking at a page on disk */
> + if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> + {
> + confess(ctx,
> + pstrdup("attribute is external but not marked as on disk"));
> + return true;
> + }
>
> First, we are checking that if VARATT_IS_1B_E and if so we will check
> whether its tag is VARTAG_ONDISK or not. But just after that, we will
> get the actual attribute pointer and
> Again check the same thing with 2 different checks. Can you explain
> why this is necessary?

The code that calls check_tuple_attribute() expects it to check the current attribute, but also to safely advance the ctx->offset value to the next attribute, as the caller is iterating over all attributes. The first check verifies that it is safe to call att_addlength_pointer, as we must not call att_addlength_pointer on a corrupt datum. The second check simply returns on non-external attributes, having advanced ctx->offset, there is nothing left to do. The third check is validating the external attribute, now that we know that it is external. You are right that the third check cannot fail, as the first check would already have confess()ed and returned false. The third check is removed in v10, attached.

> 4.
> + if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
> + (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
> + {
> + confess(ctx,
> + psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
> + }
> + if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
> + (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
> + {
> + confess(ctx,
> + psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
> + }
>
> Maybe we can further expand these checks, like if the tuple is
> HEAP_XMAX_LOCK_ONLY then HEAP_UPDATED or HEAP_HOT_UPDATED should not
> be set.

Adding Asserts in src/backend/access/heap/hio.c against those two conditions, the regression tests fail in quite a lot of places where HEAP_XMAX_LOCK_ONLY and HEAP_UPDATED are both true. I'm leaving this idea out for v10, since it doesn't work, but in case you want to tell me what I did wrong, here are the changed I made on top of v10:

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 00de10b7c9..76d23e141a 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -57,6 +57,10 @@ RelationPutHeapTuple(Relation relation,
(tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
(tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask & HEAP_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_HOT_UPDATED)));

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 49d3d5618a..60e4ad5be0 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -969,12 +969,19 @@ check_tuple(HeapCheckContext * ctx)
ctx->tuphdr->t_hoff));
fatal = true;
}
- if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
- (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ if (ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY)
{
- confess(ctx,
- psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ if (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED)
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ if (ctx->tuphdr->t_infomask & HEAP_UPDATED)
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_UPDATED both set"));
+ if (ctx->tuphdr->t_infomask2 & HEAP_HOT_UPDATED)
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_HOT_UPDATED both set"));
}
+
if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
(ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
{

The v10 patch without these ideas is here:

Attachment Content-Type Size
v10-0001-Adding-verify_heapam-and-pg_amcheck.patch application/octet-stream 159.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajay Patel 2020-07-06 19:54:50 Question: PostgreSQL on Amazon linux EC2
Previous Message Thom Brown 2020-07-06 17:35:10 Multi-byte character case-folding