Re: pgsql: Add TAP tests for pg_verify_checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: pgsql: Add TAP tests for pg_verify_checksums
Date: 2018-11-19 18:11:19
Message-ID: 20181119181119.GC23740@nighthawk.caipicrew.dd-dns.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On Sun, Oct 14, 2018 at 10:59:02AM +0900, Michael Paquier wrote:
> On Sat, Oct 13, 2018 at 05:53:00PM -0400, Andrew Dunstan wrote:
> > It occurred to me that a pretty simple fix could just be to blacklist
> > everything that didn't start with a digit. The whitelist approach is
> > probably preferable... depends how urgent we see this as.
>
> Yeah, possibly, still that's not a correct long-term fix. So attached
> is what I think we should do for HEAD and REL_11_STABLE with an approach
> using a whitelist. I have added positive and negative tests on top of
> the existing TAP tests, as suggested by Michael B, and I made the code
> use relpath.h to make sure that we don't miss any fork types.

I was looking at extending the testsuite for the online checksum
verification feature as requested by Fabien, and it seems some of the
additional tests are not working as intended:

> diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
> index dc29da09af..33408ced89 100644
> --- a/src/bin/pg_verify_checksums/t/002_actions.pl
> +++ b/src/bin/pg_verify_checksums/t/002_actions.pl

[...]

> +append_to_file "$pgdata/global/99999_vm.123", "";
> +
> # Checksums pass on a newly-created cluster
> command_ok(['pg_verify_checksums', '-D', $pgdata],
> "succeeds with offline cluster");
> @@ -67,3 +89,30 @@ $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
> [qr/Bad checksums:.*1/],
> [qr/checksum verification failed/],
> 'fails for corrupted data on single relfilenode');
> +
> +# Utility routine to check that pg_verify_checksums is correctly
> +# able to detect correctly-shaped relation files with some data.
> +sub fail_corrupt
> +{
> + my $node = shift;
> + my $file = shift;
> + my $pgdata = $node->data_dir;
> +
> + # Create the file with some data in it.
> + append_to_file "$pgdata/global/$file", "foo";
> +
> + $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
> + 1,
> + [qr/^$/],
> + [qr/could not read block/],
> + "fails for corrupted data in $file");
> +}
> +
> +fail_corrupt($node, "99999");
> +fail_corrupt($node, "99999.123");
> +fail_corrupt($node, "99999_fsm");
> +fail_corrupt($node, "99999_init");
> +fail_corrupt($node, "99999_vm");
> +fail_corrupt($node, "99999_init.123");
> +fail_corrupt($node, "99999_fsm.123");
> +fail_corrupt($node, "99999_vm.123");

First off, I think those fail_corrupt files should have different
filenames than the empty ones above (I left `99999_vm.123' as an
example of a duplicated file).

Second, and this is more severe, the subroutine never removes those
files so in practise, we are checking the first (or possibly some randon
other) file instead of the one we think we are looking at (or at least
what I think the intention is). This can be shown if you expand the
regexp from `qr/could not read block/' to `qr/could not read block 0 in
file.*$file\":/':

|t/002_actions.pl .. 6/38
|# Failed test 'fails for corrupted data in 99999.123 stderr /(?^:could
|# not read block 0 in file.*99999.123\":)/'
|# at t/002_actions.pl line 109.
|# 'pg_verify_checksums: could not read block 0 in file
|# "[...]src/bin/pg_verify_checksums/tmp_check/t_002_actions_node_checksum_data/pgdata/global/99999":
|# read 3 of 8192
|# '
|# doesn't match '(?^:could not read block 0 in file.*99999.123\":)'

So either we remove or truncate the files again after
$node->command_checks_all(), or the tests use the relfilenode feature.

However, in the latter case we would have to use a different main
relfilenode ID for each subtest, as otherwise pg_verify_checksums would
again fail on the first file it scans.

So I am proposing something like the attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachment Content-Type Size
pg_verify_checksums_tap_tests.patch text/x-diff 1.6 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-11-19 19:25:12 pgsql: Back-patch updated thread flags tests into 9.4 and 9.5.
Previous Message Tom Lane 2018-11-19 17:43:37 pgsql: Postpone LLVM-related uses of AC_CHECK_DECLS.

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-11-19 18:37:41 plpgsql plugin - stmt_beg/end is not called for top level block of statements
Previous Message Jung, Jinho 2018-11-19 17:49:43 Regarding performance regression on specific query