Quick Links

Re: new heapcheck contrib module

From:	Amul Sul <sulamul(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: new heapcheck contrib module
Date:	2020-08-17 04:37:38
Message-ID:	CAAJ_b94hsrF4x7UKQj03416fAEXJorUpZ9WhsZZjqgv-4+Wg9A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jul 30, 2020 at 11:29 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jul 27, 2020 at 1:02 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > Not at all! I appreciate all the reviews.
>
> Reviewing 0002, reading through verify_heapam.c:
>
> +typedef enum SkipPages
> +{
> + SKIP_ALL_FROZEN_PAGES,
> + SKIP_ALL_VISIBLE_PAGES,
> + SKIP_PAGES_NONE
> +} SkipPages;
>
> This looks inconsistent. Maybe just start them all with SKIP_PAGES_.
>
> + if (PG_ARGISNULL(0))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("missing required parameter for 'rel'")));
>
> This doesn't look much like other error messages in the code. Do
> something like git grep -A4 PG_ARGISNULL | grep -A3 ereport and study
> the comparables.
>
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("unrecognized parameter for 'skip': %s", skip),
> + errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));
>
> Same problem. Check pg_prewarm's handling of the prewarm type, or
> EXPLAIN's handling of the FORMAT option, or similar examples. Read the
> message style guidelines concerning punctuation of hint and detail
> messages.
>
> + * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
> + * to have sometimes rendered the oldest xid value for a database invalid.
> + * It seems unwise to report rows as corrupt for failing to be newer than
> + * a value which itself may be corrupt. We instead use the oldest xid for
> + * the entire cluster, which must be at least as old as the oldest xid for
> + * our database.
>
> This kind of reference to another comment will not age well; line
> numbers and files change a lot. But I think the right thing to do here
> is just rely on relfrozenxid and relminmxid. If the table is
> inconsistent with those, then something needs fixing. datfrozenxid and
> the cluster-wide value can look out for themselves. The corruption
> detector shouldn't be trying to work around any bugs in setting
> relfrozenxid itself; such problems are arguably precisely what we're
> here to find.
>
> +/*
> + * confess
> + *
> + * Return a message about corruption, including information
> + * about where in the relation the corruption was found.
> + *
> + * The msg argument is pfree'd by this function.
> + */
> +static void
> +confess(HeapCheckContext *ctx, char *msg)
>
> Contrary to what the comments say, the function doesn't return a
> message about corruption or anything else. It returns void.
>
> I don't really like the name, either. I get that it's probably
> inspired by Perl, but I think it should be given a less-clever name
> like report_corruption() or something.
>
> + * corrupted table from using workmem worth of memory building up the
>
> This kind of thing destroys grep-ability. If you're going to refer to
> work_mem, you gotta spell it the same way we do everywhere else.
>
> + * Helper function to construct the TupleDesc needed by verify_heapam.
>
> Instead of saying it's the TupleDesc somebody needs, how about saying
> that it's the TupleDesc that we'll use to report problems that we find
> while scanning the heap, or something like that?
>
> + * Given a TransactionId, attempt to interpret it as a valid
> + * FullTransactionId, neither in the future nor overlong in
> + * the past. Stores the inferred FullTransactionId in *fxid.
>
> It really doesn't, because there's no such thing as 'fxid' referenced
> anywhere here. You should really make the effort to proofread your
> patches before posting, and adjust comments and so on as you go.
> Otherwise reviewing takes longer, and if you keep introducing new
> stuff like this as you fix other stuff, you can fail to ever produce a
> committable patch.
>
> + * Determine whether tuples are visible for verification. Similar to
> + * HeapTupleSatisfiesVacuum, but with critical differences.
>
> Not accurate, because it also reports problems, which is not mentioned
> anywhere in the function header comment that purports to be a detailed
> description of what the function does.
>
> + else if (TransactionIdIsCurrentTransactionId(raw_xmin))
> + return true; /* insert or delete in progress */
> + else if (TransactionIdIsInProgress(raw_xmin))
> + return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
> + else if (!TransactionIdDidCommit(raw_xmin))
> + {
> + return false; /* HEAPTUPLE_DEAD */
> + }
>
> One of these cases is not punctuated like the others.
>
> + pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor
> has a valid xmax"));
>
> 1. I don't think that's very grammatical.
>
> 2. Why abbreviate HEAP_XMAX_IS_MULTI to XMAX_IS_MULTI and
> HEAP_XMAX_IS_LOCKED_ONLY to LOCKED_ONLY? I don't even think you should
> be referencing C constant names here at all, and if you are I don't
> think you should abbreviate, and if you do abbreviate I don't think
> you should omit different numbers of words depending on which constant
> it is.
>
> I wonder what the intended division of responsibility is here,
> exactly. It seems like you've ended up with some sanity checks in
> check_tuple() before tuple_is_visible() is called, and others in
> tuple_is_visible() proper. As far as I can see the comments don't
> really discuss the logic behind the split, but there's clearly a close
> relationship between the two sets of checks, even to the point where
> you have "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has
> a valid xmax" in tuple_is_visible() and "tuple xmax marked
> incompatibly as keys updated and locked only" in check_tuple(). Now,
> those are not the same check, but they seem like closely related
> things, so it's not ideal that they happen in different functions with
> differently-formatted messages to report problems and no explanation
> of why it's different.
>
> I think it might make sense here to see whether you could either move
> more stuff out of tuple_is_visible(), so that it really just checks
> whether the tuple is visible, or move more stuff into it, so that it
> has the job not only of checking whether we should continue with
> checks on the tuple contents but also complaining about any other
> visibility problems. Or if neither of those make sense then there
> should be a stronger attempt to rationalize in the comments what
> checks are going where and for what reason, and also a stronger
> attempt to rationalize the message wording.
>
> + curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
> + ctx->toast_rel->rd_att, &isnull));
>
> Should we be worrying about the possibility of fastgetattr crapping
> out if the TOAST tuple is corrupted?
>
> + if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
> + {
> + confess(ctx,
> + psprintf("tuple attribute should start at offset %u, but tuple
> length is only %u",
> + ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len));
> + return false;
> + }
> +
> + /* Skip null values */
> + if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
> + return true;
> +
> + /* Skip non-varlena values, but update offset first */
> + if (thisatt->attlen != -1)
> + {
> + ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
> + ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
> + tp + ctx->offset);
> + return true;
> + }
>
> This looks like it's not going to complain about a fixed-length
> attribute that overruns the tuple length. There's code further down
> that handles that case for a varlena attribute, but there's nothing
> comparable for the fixed-length case.
>
> + confess(ctx,
> + psprintf("%s toast at offset %u is unexpected",
> + va_tag == VARTAG_INDIRECT ? "indirect" :
> + va_tag == VARTAG_EXPANDED_RO ? "expanded" :
> + va_tag == VARTAG_EXPANDED_RW ? "expanded" :
> + "unexpected",
> + ctx->tuphdr->t_hoff + ctx->offset));
>
> I suggest "unexpected TOAST tag %d", without trying to convert to a
> string. Such a conversion will likely fail in the case of genuine
> corruption, and isn't meaningful even if it works.
>
> Again, let's try to standardize terminology here: most of the messages
> in this function are now of the form "tuple attribute %d has some
> problem" or "attribute %d has some problem", but some have neither.
> Since we're separately returning attnum I don't see why it should be
> in the message, and if we weren't separately returning attnum then it
> ought to be in the message the same way all the time, rather than
> sometimes writing "attribute" and other times "tuple attribute".
>
> + /* Check relminmxid against mxid, if any */
> + xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
> + if (infomask & HEAP_XMAX_IS_MULTI &&
> + MultiXactIdPrecedes(xmax, ctx->relminmxid))
> + {
> + confess(ctx,
> + psprintf("tuple xmax %u precedes relminmxid %u",
> + xmax, ctx->relminmxid));
> + fatal = true;
> + }
>
> There are checks that an XID is neither too old nor too new, and
> presumably something similar could be done for MultiXactIds, but here
> you only check one end of the range. Seems like you should check both.
>
> + /* Check xmin against relfrozenxid */
> + xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
> + if (TransactionIdIsNormal(ctx->relfrozenxid) &&
> + TransactionIdIsNormal(xmin))
> + {
> + if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
> + {
> + confess(ctx,
> + psprintf("tuple xmin %u precedes relfrozenxid %u",
> + xmin, ctx->relfrozenxid));
> + fatal = true;
> + }
> + else if (!xid_valid_in_rel(xmin, ctx))
> + {
> + confess(ctx,
> + psprintf("tuple xmin %u follows last assigned xid %u",
> + xmin, ctx->next_valid_xid));
> + fatal = true;
> + }
> + }
>
> Here you do check both ends of the range, but the comment claims
> otherwise. Again, please proof-read for this kind of stuff.
>
> + /* Check xmax against relfrozenxid */
>
> Ditto here.
>
> + psprintf("tuple's header size is %u bytes which is less than the %u
> byte minimum valid header size",
>
> I suggest: tuple data begins at byte %u, but the tuple header must be
> at least %u bytes
>
> + psprintf("tuple's %u byte header size exceeds the %u byte length of
> the entire tuple",
>
> I suggest: tuple data begins at byte %u, but the entire tuple length
> is only %u bytes
>
> + psprintf("tuple's user data offset %u not maximally aligned to %u",
>
> I suggest: tuple data begins at byte %u, but that is not maximally aligned
> Or: tuple data begins at byte %u, which is not a multiple of %u
>
> That makes the messages look much more similar to each other
> grammatically and is more consistent about calling things by the same
> names.
>
> + psprintf("tuple with null values has user data offset %u rather than
> the expected offset %u",
> + psprintf("tuple without null values has user data offset %u rather
> than the expected offset %u",
>
> I suggest merging these: tuple data offset %u, but expected offset %u
> (%u attributes, %s)
> where %s is either "has nulls" or "no nulls"
>
> In fact, aren't several of the above checks redundant with this one?
> Like, why check for a value less than SizeofHeapTupleHeader or that's
> not properly aligned first? Just check this straightaway and call it
> good.
>
> + * If we get this far, the tuple is visible to us, so it must not be
> + * incompatible with our relDesc. The natts field could be legitimately
> + * shorter than rel's natts, but it cannot be longer than rel's natts.
>
> This is yet another case where you didn't update the comments.
> tuple_is_visible() now checks whether the tuple is visible to anyone,
> not whether it's visible to us, but the comment doesn't agree. In some
> sense I think this comment is redundant with the previous one anyway,
> because that one already talks about the tuple being visible. Maybe
> just write: The tuple is visible, so it must be compatible with the
> current version of the relation descriptor. It might have fewer
> columns than are present in the relation descriptor, but it cannot
> have more.
>
> + psprintf("tuple has %u attributes in relation with only %u attributes",
> + ctx->natts,
> + RelationGetDescr(ctx->rel)->natts));
>
> I suggest: tuple has %u attributes, but relation has only %u attributes
>
> + /*
> + * Iterate over the attributes looking for broken toast values. This
> + * roughly follows the logic of heap_deform_tuple, except that it doesn't
> + * bother building up isnull[] and values[] arrays, since nobody wants
> + * them, and it unrolls anything that might trip over an Assert when
> + * processing corrupt data.
> + */
> + ctx->offset = 0;
> + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> + {
> + if (!check_tuple_attribute(ctx))
> + break;
> + }
>
> I think this comment is too wordy. This text belongs in the header
> comment of check_tuple_attribute(), not at the place where it gets
> called. Otherwise, as you update what check_tuple_attribute() does,
> you have to remember to come find this comment and fix it to match,
> and you might forget to do that. In fact... looks like that already
> happened, because check_tuple_attribute() now checks more than broken
> TOAST attributes. Seems like you could just simplify this down to
> something like "Now check each attribute." Also, you could lose the
> extra braces.
>
> - bt_index_check | relname | relpages
> + bt_index_check | relname | relpages
>
> Don't include unrelated changes in the patch.
>
> I'm not really sure that the list of fields you're displaying for each
> reported problem really makes sense. I think the theory here should be
> that we want to report the information that the user needs to localize
> the problem but not everything that they could find out from
> inspecting the page, and not things that are too specific to
> particular classes of errors. So I would vote for keeping blkno,
> offnum, and attnum, but I would lose lp_flags, lp_len, and chunk.
> lp_off feels like it's a more arguable case: technically, it's a
> locator for the problem, because it gives you the byte offset within
> the page, but normally we reference tuples by TID, i.e. (blkno,
> offset), not byte offset. On balance I'd be inclined to omit it.
>
> --

In addition to this, I found a few more things while reading v13 patch are as
below:

Patch v13-0001:

-
+#include "amcheck.h"

Not in correct order.

+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;

Unnecessary spaces/tabs between } and BtreeCheckContext.

static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);

Unnecessary space between * and ctx. The same changes needed for other places as
well.
---

Patch v13-0002:

+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);

Can't we make it work? If the input is partitioned, I think we could
collect all its leaf partitions and process them one by one. Thoughts?

+ ctx->chunkno++;

Instead of incrementing in check_toast_tuple(), I think incrementing should
happen at the caller -- just after check_toast_tuple() call.
---

Patch v13-0003:

+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);

resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer().

+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid

pg_class should be schema-qualified like elsewhere. IIUC, pg_class is meant to
get relname only, instead, we could use '%u'::pg_catalog.regclass in the target
list for the relname. Thoughts?

Also I think we should skip '\n' from the query string (see appendPQExpBuffer()
in pg_dump.c)

+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));

I don't think we need the search_path query. The main query doesn't have any
dependencies on it. Same is in check_indexes(), check_index (),
expand_table_name_patterns() & get_table_check_list().
Correct me if I am missing something.

+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);

I am not sure why we want PageOutput() if the second argument is always going to
be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ?
e.g. usage() function in pg_basebackup.c.

Regards,
Amul

In response to

Re: new heapcheck contrib module at 2020-07-30 17:59:13 from Robert Haas

Responses

Re: new heapcheck contrib module at 2020-08-20 02:30:27 from Mark Dilger

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2020-08-17 05:15:34	Re: tab completion of IMPORT FOREIGN SCHEMA
Previous Message	Greg Nancarrow	2020-08-17 04:14:36	Re: Parallel copy