Quick Links

pg_amcheck contrib application

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	pg_amcheck contrib application
Date:	2021-03-03 15:22:28
Message-ID:	BA592F2D-F928-46FF-9516-2B827F067F57@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

New thread, was "Re: new heapcheck contrib module"

> On Mar 2, 2021, at 10:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Mar 2, 2021 at 12:10 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> On further reflection, I decided to implement these changes and not worry about the behavioral change.
>
> Thanks.
>
>> I skipped this part. The initcmd argument is only handed to ParallelSlotsGetIdle(). Doing as you suggest would not really be simpler, it would just move that argument to ParallelSlotsSetup(). But I don't feel strongly about it, so I can move this, too, if you like.
>>
>> I didn't do this either, and for the same reason. It's just a parameter to ParallelSlotsGetIdle(), so nothing is really gained by moving it to ParallelSlotsSetup().
>
> OK. I thought it was more natural to pass a bunch of arguments at
> setup time rather than passing a bunch of arguments at get-idle time,
> but I don't feel strongly enough about it to insist, and somebody else
> can always change it later if they decide I had the right idea.

When you originally proposed the idea, I thought that it would work out as a simpler interface to have it your way, but in terms of the interface it came out about the same. Internally it is still simpler to do it your way, so since you seem to still like your way better, this next version has it that way.

>> Rather than the slots user tweak the slot's ConnParams, ParallelSlotsGetIdle() takes a dbname argument, and uses it as ConnParams->override_dbname.
>
> OK, but you forgot to update the comments. ParallelSlotsGetIdle()
> still talks about a cparams argument that it no longer has.

Fixed.

> The usual idiom for sizing a memory allocation involving
> FLEXIBLE_ARRAY_MEMBER is something like offsetof(ParallelSlotArray,
> slots) + numslots * sizeof(ParallelSlot). Your version uses sizeof();
> don't.

Fixed.

> Other than that 0001 looks to me to be in pretty good shape now.

And your other review email, also moved to this new thread....

> On Mar 2, 2021, at 12:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Mar 2, 2021 at 1:24 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Other than that 0001 looks to me to be in pretty good shape now.
>
> Incidentally, we might want to move this to a new thread with a better
> subject line, since the current subject line really doesn't describe
> the uncommitted portion of the work. And create a new CF entry, too.

Moved here.

> Moving onto 0002:
>
> The index checking options should really be called btree index
> checking options. I think I'd put the table options first, and the
> btree options second. Other kinds of indexes could follow some day. I
> would personally omit the short forms of --heapallindexed and
> --parent-check; I think we'll run out of option names too quickly if
> people add more kinds of checks.

Done.

While doing this, I also renamed some of the variables to more closely match the option name. I think the code is clearer now.

>
> Perhaps VerifyBtreeSlotHandler should emit a warning of some kind if
> PQntuples(res) != 0.

The functions bt_index_check and bt_index_parent_check are defined to return VOID, which results in PQntuples(res) == 1. I added code to verify this condition, but it only serves to alert the user if the amcheck version is behaving in an unexpected way, perhaps due to a amcheck/pg_amcheck version mismatch.

> + /*
> + * Test that this function works, but for now we're
> not using the list
> + * 'relations' that it builds.
> + */
> + conn = connectDatabase(&cparams, progname, opts.echo,
> false, true);
>
> This comment appears to have nothing to do with the code, since
> connectDatabase() does not build a list of 'relations'.

True. Removed.

> amcheck_sql seems to include paranoia, but do we need that if we're
> using a secure search path? Similarly for other SQL queries, e.g. in
> prepare_table_command.

I removed the OPERATOR(pg_catalog.=) paranoia.

> It might not be strictly necessary for the static functions in
> pg_amcheck.c to use_three completelyDifferent NamingConventions for
> its static functions.

The idea is that the functions that interoperate with parallel slots would follow its NamingConvention; those interoperating with patternToSQLRegex and PQExpBuffers would follow their namingConvention; and those not so interoperating would follow a less obnoxious naming_convention. To my eye, that color codes the function names in a useful way. To your eye, it just looks awful. I've changed it to use just one naming_convention.

> should_processing_continue() is one semicolon over budget.

That's not the first time I've done that recently. Removed.

> The initializer for opts puts a comma even after the last member
> initializer. Is that going to be portable to all compilers?

I don't know. I learned to put commas at the end of lists back when I did mostly perl programming, as you get cleaner diffs when you add more stuff to the list later. Whether I can get away with that in C using initializers I don't know. I don't have a multiplicity of compilers to check.

I have removed the extra comma.

> + for (failed = false, cell = opts.include.head; cell; cell = cell->next)
>
> I think failed has to be false here, because it gets initialized at
> the top of the function. If we need to reinitialize it for some
> reason, I would prefer you do that on the previous line, separate from
> the for loop stuff.

It does have to be false there. There is no need to reinitialize it.

> + char *dbrgx; /* Database regexp parsed from pattern, or
> + * NULL */
> + char *nsprgx; /* Schema regexp parsed from pattern, or NULL */
> + char *relrgx; /* Relation regexp parsed from pattern, or
> + * NULL */
> + bool tblonly; /* true if relrgx should only match tables */
> + bool idxonly; /* true if relrgx should only match indexes */
>
> Maybe: db_regex, nsp_regex, rel_regex, table_only, index_only?
>
> Just because it seems theoretically possible that someone will see
> nsprgx and not immediately understand what it's supposed to mean, even
> if they know that nsp is a common abbreviation for namespace in
> PostgreSQL code, and even if they also know what a regular expression
> is.

Changed. Along the way, I noticed that "tbl" and "idx" were being used in C/SQL both to mean ("table_only", "index_only") in some contexts and ("is_table", "is_index') in others, so I replaced all instances of "tbl" and "idx" with the unambiguous labels.

> Your four messages about there being nothing to check seem like they
> could be consolidated down to one: "nothing to check for pattern
> \"%s\"".

I anticipated your review comment, but I'm worried about the case that somebody runs

pg_amcheck -t "foo" -i "foo"

and one of those matches and the other does not. The message 'nothing to check for pattern "foo"' will be wrong (because there was something to check for it) and unhelpful (because it doesn't say which failed to match.)

> I would favor changing things so that once argument parsing is
> complete, we switch to reporting all errors that way. So in other
> words here, and everything that follows:
>
> + fprintf(stderr, "%s: no databases to check\n", progname);

Same concern about the output for

pg_amcheck -t "foo" -i "foo" -d "foo"

You might think I'm being silly here, as database names, table names, and index names should in normal usage not be hard for the user to distinguish. But consider

pg_amcheck "mydb.myschema.mytable"

If it says, 'nothing to check for pattern "mydb.myschema.mytable"', you don't know if the database doesn't exist or if the table doesn't exist.

>
> + * ParallelSlots based event loop follows.
>
> "Main event loop."

Changed.

> To me it would read slightly better to change each reference to
> "relations list" to "list of relations", but perhaps that is too
> nitpicky.

No harm picking those nits. Changed.

> I think the two instances of goto finish could be avoided with not
> much work. At most a few things need to happen only if !failed, and
> maybe not even that, if you just said "break;" instead.

Good point. Changed.

> + * Note: Heap relation corruption is returned by verify_heapam() without the
> + * use of raising errors, but running verify_heapam() on a corrupted table may
>
> How about "Heap relation corruption() is reported by verify_heapam()
> via the result set, rather than an ERROR, ..."

Changed, though I assumed your parens for corruption() were not intended.

Ok, so now you've moved on to reviewing the regression tests....

> It seems mighty inefficient to have a whole bunch of consecutive calls
> to remove_relation_file() or corrupt_first_page() when every such call
> stops and restarts the database. I would guess these tests will run
> noticeably faster if you don't do that. Either the functions need to
> take a list of arguments, or the stop/start needs to be pulled up and
> done in the caller.

Changed.

> corrupt_first_page() could use a comment explaining what exactly we're
> overwriting, and in particular noting that we don't want to just
> clobber the LSN, but rather something where we can detect a wrong
> value.

Added comments that we're skipping past the PageHeader and overwriting garbage starting in the line pointers.

> There's a long list of calls to command_checks_all() in 003_check.pl
> that don't actually check anything but that the command failed, but
> run it with a bunch of different options. I don't understand the value
> of that, and suggest reducing the number of cases tested. If you want,
> you can have tests elsewhere that focus -- perhaps by using verbose
> mode -- on checking that the right tables are being checked.

This should be better in this next patch series.

>
> This is not yet a full review of everything in this patch -- I haven't
> sorted through all of the tests yet, or all of the new query
> construction logic -- but to me this looks pretty close to
> committable.

Thanks for the review!

Attachment	Content-Type	Size
v42-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch	application/octet-stream	24.6 KB
v42-0002-Adding-contrib-module-pg_amcheck.patch	application/octet-stream	131.7 KB
v42-0003-Extending-PostgresNode-to-test-corruption.patch	application/octet-stream	16.7 KB

Responses

Re: pg_amcheck contrib application at 2021-03-03 17:15:49 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kohei KaiGai	2021-03-03 15:23:56	Re: contrib/cube - binary input/output handlers
Previous Message	Andrew Dunstan	2021-03-03 15:18:44	Re: buildfarm windows checks / tap tests on windows